what are the elements of hypothesis testing

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a or H 1 ).
Collect data in a way designed to test the hypothesis.
Perform an appropriate statistical test .
Decide whether to reject or fail to reject your null hypothesis.
Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

Academic style
Vague sentences
Style consistency

See an example

what are the elements of hypothesis testing

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

an estimate of the difference in average height between the two groups.
a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Prevent plagiarism. Run a free check.

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Descriptive statistics
Measures of central tendency
Correlation coefficient

Methodology

Cluster sampling
Stratified sampling
Types of interviews
Cohort study
Thematic analysis

Research bias

Implicit bias
Cognitive bias
Survivorship bias
Availability heuristic
Nonresponse bias
Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved July 22, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

8.1 The Elements of Hypothesis Testing

Learning objectives.

To understand the logical framework of tests of hypotheses.
To learn basic terminology connected with hypothesis testing.
To learn fundamental facts about hypothesis testing.

Types of Hypotheses

A hypothesis about the value of a population parameter is an assertion about its value. As in the introductory example we will be concerned with testing the truth of two competing hypotheses, only one of which can be true.

The null hypothesis The statement that is assumed to be true unless there is convincing evidence to the contrary. , denoted H 0 , is the statement about the population parameter that is assumed to be true unless there is convincing evidence to the contrary .

The alternative hypothesis A statement that is accepted as true only if there is convincing evidence in favor of it. , denoted H a , is a statement about the population parameter that is contradictory to the null hypothesis, and is accepted as true only if there is convincing evidence in favor of it.

Hypothesis testing A statistical procedure in which a choice is made between a null hypothesis and a specific alternative hypothesis based on information in a sample. is a statistical procedure in which a choice is made between a null hypothesis and an alternative hypothesis based on information in a sample.

The end result of a hypotheses testing procedure is a choice of one of the following two possible conclusions:

Reject H 0 (and therefore accept H a ), or
Fail to reject H 0 (and therefore fail to accept H a ).

The null hypothesis typically represents the status quo, or what has historically been true. In the example of the respirators, we would believe the claim of the manufacturer unless there is reason not to do so, so the null hypotheses is H 0 : μ = 75 . The alternative hypothesis in the example is the contradictory statement H a : μ < 75 . The null hypothesis will always be an assertion containing an equals sign, but depending on the situation the alternative hypothesis can have any one of three forms: with the symbol “<,” as in the example just discussed, with the symbol “>,” or with the symbol “≠” The following two examples illustrate the latter two cases.

A publisher of college textbooks claims that the average price of all hardbound college textbooks is $127.50. A student group believes that the actual mean is higher and wishes to test their belief. State the relevant null and alternative hypotheses.

The default option is to accept the publisher’s claim unless there is compelling evidence to the contrary. Thus the null hypothesis is H 0 : μ = 127.50 . Since the student group thinks that the average textbook price is greater than the publisher’s figure, the alternative hypothesis in this situation is H a : μ > 127.50 .

The recipe for a bakery item is designed to result in a product that contains 8 grams of fat per serving. The quality control department samples the product periodically to insure that the production process is working as designed. State the relevant null and alternative hypotheses.

The default option is to assume that the product contains the amount of fat it was formulated to contain unless there is compelling evidence to the contrary. Thus the null hypothesis is H 0 : μ = 8.0 . Since to contain either more fat than desired or to contain less fat than desired are both an indication of a faulty production process, the alternative hypothesis in this situation is that the mean is different from 8.0, so H a : μ ≠ 8.0 .

In Note 8.8 "Example 1" , the textbook example, it might seem more natural that the publisher’s claim be that the average price is at most $127.50, not exactly $127.50. If the claim were made this way, then the null hypothesis would be H 0 : μ ≤ 127.50 , and the value $127.50 given in the example would be the one that is least favorable to the publisher’s claim, the null hypothesis. It is always true that if the null hypothesis is retained for its least favorable value, then it is retained for every other value.

Thus in order to make the null and alternative hypotheses easy for the student to distinguish, in every example and problem in this text we will always present one of the two competing claims about the value of a parameter with an equality. The claim expressed with an equality is the null hypothesis. This is the same as always stating the null hypothesis in the least favorable light. So in the introductory example about the respirators, we stated the manufacturer’s claim as “the average is 75 minutes” instead of the perhaps more natural “the average is at least 75 minutes,” essentially reducing the presentation of the null hypothesis to its worst case.

The first step in hypothesis testing is to identify the null and alternative hypotheses.

The Logic of Hypothesis Testing

Although we will study hypothesis testing in situations other than for a single population mean (for example, for a population proportion instead of a mean or in comparing the means of two different populations), in this section the discussion will always be given in terms of a single population mean μ .

The null hypothesis always has the form H 0 : μ = μ 0 for a specific number μ 0 (in the respirator example μ 0 = 75 , in the textbook example μ 0 = 127.50 , and in the baked goods example μ 0 = 8.0 ). Since the null hypothesis is accepted unless there is strong evidence to the contrary, the test procedure is based on the initial assumption that H 0 is true. This point is so important that we will repeat it in a display:

The test procedure is based on the initial assumption that H 0 is true.

The criterion for judging between H 0 and H a based on the sample data is: if the value of X - would be highly unlikely to occur if H 0 were true, but favors the truth of H a , then we reject H 0 in favor of H a . Otherwise we do not reject H 0 .

Supposing for now that X - follows a normal distribution, when the null hypothesis is true the density function for the sample mean X - must be as in Figure 8.1 "The Density Curve for " : a bell curve centered at μ 0 . Thus if H 0 is true then X - is likely to take a value near μ 0 and is unlikely to take values far away. Our decision procedure therefore reduces simply to:

if H a has the form H a : μ < μ 0 then reject H 0 if x - is far to the left of μ 0 ;
if H a has the form H a : μ > μ 0 then reject H 0 if x - is far to the right of μ 0 ;
if H a has the form H a : μ ≠ μ 0 then reject H 0 if x - is far away from μ 0 in either direction.

Figure 8.1 The Density Curve for X - if H 0 Is True

Think of the respirator example, for which the null hypothesis is H 0 : μ = 75 , the claim that the average time air is delivered for all respirators is 75 minutes. If the sample mean is 75 or greater then we certainly would not reject H 0 (since there is no issue with an emergency respirator delivering air even longer than claimed).

If the sample mean is slightly less than 75 then we would logically attribute the difference to sampling error and also not reject H 0 either.

Values of the sample mean that are smaller and smaller are less and less likely to come from a population for which the population mean is 75. Thus if the sample mean is far less than 75, say around 60 minutes or less, then we would certainly reject H 0 , because we know that it is highly unlikely that the average of a sample would be so low if the population mean were 75. This is the rare event criterion for rejection: what we actually observed ( X - < 60 ) would be so rare an event if μ = 75 were true that we regard it as much more likely that the alternative hypothesis μ < 75 holds.

In summary, to decide between H 0 and H a in this example we would select a “ rejection region An interval or union of intervals such that the null hypothesis is rejected if and only if the statistic of interest lies in this region. ” of values sufficiently far to the left of 75, based on the rare event criterion, and reject H 0 if the sample mean X - lies in the rejection region, but not reject H 0 if it does not.

The Rejection Region

Each different form of the alternative hypothesis H a has its own kind of rejection region:

if (as in the respirator example) H a has the form H a : μ < μ 0 , we reject H 0 if x - is far to the left of μ 0 , that is, to the left of some number C , so the rejection region has the form of an interval (−∞, C ];
if (as in the textbook example) H a has the form H a : μ > μ 0 , we reject H 0 if x - is far to the right of μ 0 , that is, to the right of some number C , so the rejection region has the form of an interval [ C ,∞);
if (as in the baked good example) H a has the form H a : μ ≠ μ 0 , we reject H 0 if x - is far away from μ 0 in either direction, that is, either to the left of some number C or to the right of some other number C ′, so the rejection region has the form of the union of two intervals (−∞, C ]∪[ C ′,∞).

The key issue in our line of reasoning is the question of how to determine the number C or numbers C and C ′, called the critical value or critical values of the statistic, that determine the rejection region.

The critical value The number or one of a pair of numbers that determines the rejection region. or critical values of a test of hypotheses are the number or numbers that determine the rejection region.

Suppose the rejection region is a single interval, so we need to select a single number C . Here is the procedure for doing so. We select a small probability, denoted α , say 1%, which we take as our definition of “rare event:” an event is “rare” if its probability of occurrence is less than α . (In all the examples and problems in this text the value of α will be given already.) The probability that X - takes a value in an interval is the area under its density curve and above that interval, so as shown in Figure 8.2 (drawn under the assumption that H 0 is true, so that the curve centers at μ 0 ) the critical value C is the value of X - that cuts off a tail area α in the probability density curve of X - . When the rejection region is in two pieces, that is, composed of two intervals, the total area above both of them must be α , so the area above each one is α ∕ 2 , as also shown in Figure 8.2 .

The number α is the total area of a tail or a pair of tails.

In the context of Note 8.9 "Example 2" , suppose that it is known that the population is normally distributed with standard deviation σ = 0.15 gram, and suppose that the test of hypotheses H 0 : μ = 8.0 versus H a : μ ≠ 8.0 will be performed with a sample of size 5. Construct the rejection region for the test for the choice α = 0.10 . Explain the decision procedure and interpret it.

If H 0 is true then the sample mean X - is normally distributed with mean and standard deviation

Since H a contains the ≠ symbol the rejection region will be in two pieces, each one corresponding to a tail of area α ∕ 2 = 0.10 ∕ 2 = 0.05 . From Figure 12.3 "Critical Values of " , z 0.05 = 1.645 , so C and C ′ are 1.645 standard deviations of X - to the right and left of its mean 8.0:

The result is shown in Figure 8.3 "Rejection Region for the Choice " .

Figure 8.3 Rejection Region for the Choice α = 0.10

The decision procedure is: take a sample of size 5 and compute the sample mean x - . If x - is either 7.89 grams or less or 8.11 grams or more then reject the hypothesis that the average amount of fat in all servings of the product is 8.0 grams in favor of the alternative that it is different from 8.0 grams. Otherwise do not reject the hypothesis that the average amount is 8.0 grams.

The reasoning is that if the true average amount of fat per serving were 8.0 grams then there would be less than a 10% chance that a sample of size 5 would produce a mean of either 7.89 grams or less or 8.11 grams or more. Hence if that happened it would be more likely that the value 8.0 is incorrect (always assuming that the population standard deviation is 0.15 gram).

Because the rejection regions are computed based on areas in tails of distributions, as shown in Figure 8.2 , hypothesis tests are classified according to the form of the alternative hypothesis in the following way.

If H a has the form μ ≠ μ 0 the test is called a two-tailed test .

If H a has the form μ < μ 0 the test is called a left-tailed test .

If H a has the form μ > μ 0 the test is called a right-tailed test .

Each of the last two forms is also called a one-tailed test .

Two Types of Errors

The format of the testing procedure in general terms is to take a sample and use the information it contains to come to a decision about the two hypotheses. As stated before our decision will always be either

reject the null hypothesis H 0 in favor of the alternative H a presented, or
do not reject the null hypothesis H 0 in favor of the alternative H a presented.

There are four possible outcomes of hypothesis testing procedure, as shown in the following table:

		True State of Nature
		is true	is false
Our Decision	Do not reject	Correct decision	Type II error
Our Decision	Reject	Type I error	Correct decision

As the table shows, there are two ways to be right and two ways to be wrong. Typically to reject H 0 when it is actually true is a more serious error than to fail to reject it when it is false, so the former error is labeled “Type I” and the latter error “Type II.”

In a test of hypotheses, a Type I error Rejection of a true null hypothesis. is the decision to reject H 0 when it is in fact true. A Type II error Failure to reject a false null hypothesis. is the decision not to reject H 0 when it is in fact not true.

Unless we perform a census we do not have certain knowledge, so we do not know whether our decision matches the true state of nature or if we have made an error. We reject H 0 if what we observe would be a “rare” event if H 0 were true. But rare events are not impossible: they occur with probability α . Thus when H 0 is true, a rare event will be observed in the proportion α of repeated similar tests, and H 0 will be erroneously rejected in those tests. Thus α is the probability that in following the testing procedure to decide between H 0 and H a we will make a Type I error.

The number α that is used to determine the rejection region is called the level of significance of the test The probability α that defines an event as “rare;” the probability that the test procedure will lead to a Type I error. . It is the probability that the test procedure will result in a Type I error.

The probability of making a Type II error is too complicated to discuss in a beginning text, so we will say no more about it than this: for a fixed sample size, choosing α smaller in order to reduce the chance of making a Type I error has the effect of increasing the chance of making a Type II error. The only way to simultaneously reduce the chances of making either kind of error is to increase the sample size.

Standardizing the Test Statistic

Hypotheses testing will be considered in a number of contexts, and great unification as well as simplification results when the relevant sample statistic is standardized by subtracting its mean from it and then dividing by its standard deviation. The resulting statistic is called a standardized test statistic . In every situation treated in this and the following two chapters the standardized test statistic will have either the standard normal distribution or Student’s t -distribution.

A standardized test statistic The standardized statistic used in performing the test. for a hypothesis test is the statistic that is formed by subtracting from the statistic of interest its mean and dividing by its standard deviation.

For example, reviewing Note 8.14 "Example 3" , if instead of working with the sample mean X - we instead work with the test statistic

then the distribution involved is standard normal and the critical values are just ± z 0.05 . The extra work that was done to find that C = 7.89 and C ′ = 8.11 is eliminated. In every hypothesis test in this book the standardized test statistic will be governed by either the standard normal distribution or Student’s t -distribution. Information about rejection regions is summarized in the following tables:

When the test statistic has the standard normal distribution:
Symbol in	Terminology	Rejection Region
<	Left-tailed test
>	Right-tailed test
≠	Two-tailed test

When the test statistic has Student’s -distribution:
Symbol in	Terminology	Rejection Region
<	Left-tailed test
>	Right-tailed test
≠	Two-tailed test

Every instance of hypothesis testing discussed in this and the following two chapters will have a rejection region like one of the six forms tabulated in the tables above.

No matter what the context a test of hypotheses can always be performed by applying the following systematic procedure, which will be illustrated in the examples in the succeeding sections.

Systematic Hypothesis Testing Procedure: Critical Value Approach

Identify the null and alternative hypotheses.
Identify the relevant test statistic and its distribution.
Compute from the data the value of the test statistic.
Construct the rejection region.
Compare the value computed in Step 3 to the rejection region constructed in Step 4 and make a decision. Formulate the decision in the context of the problem, if applicable.

The procedure that we have outlined in this section is called the “Critical Value Approach” to hypothesis testing to distinguish it from an alternative but equivalent approach that will be introduced at the end of Section 8.3 "The Observed Significance of a Test" .

Key Takeaways

A test of hypotheses is a statistical process for deciding between two competing assertions about a population parameter.
The testing procedure is formalized in a five-step procedure.

State the null and alternative hypotheses for each of the following situations. (That is, identify the correct number μ 0 and write H 0 : μ = μ 0 and the appropriate analogous expression for H a .)

The average July temperature in a region historically has been 74.5°F. Perhaps it is higher now.
The average weight of a female airline passenger with luggage was 145 pounds ten years ago. The FAA believes it to be higher now.
The average stipend for doctoral students in a particular discipline at a state university is $14,756. The department chairman believes that the national average is higher.
The average room rate in hotels in a certain region is $82.53. A travel agent believes that the average in a particular resort area is different.
The average farm size in a predominately rural state was 69.4 acres. The secretary of agriculture of that state asserts that it is less today.
The average time workers spent commuting to work in Verona five years ago was 38.2 minutes. The Verona Chamber of Commerce asserts that the average is less now.
The mean salary for all men in a certain profession is $58,291. A special interest group thinks that the mean salary for women in the same profession is different.
The accepted figure for the caffeine content of an 8-ounce cup of coffee is 133 mg. A dietitian believes that the average for coffee served in a local restaurants is higher.
The average yield per acre for all types of corn in a recent year was 161.9 bushels. An economist believes that the average yield per acre is different this year.
An industry association asserts that the average age of all self-described fly fishermen is 42.8 years. A sociologist suspects that it is higher.

Describe the two types of errors that can be made in a test of hypotheses.

Under what circumstance is a test of hypotheses certain to yield a correct decision?

H 0 : μ = 74.5 vs. H a : μ > 74.5
H 0 : μ = 145 vs. H a : μ > 145
H 0 : μ = 14756 vs. H a : μ > 14756
H 0 : μ = 82.53 vs. H a : μ ≠ 82.53
H 0 : μ = 69.4 vs. H a : μ < 69.4

A Type I error is made when a true H 0 is rejected. A Type II error is made when a false H 0 is not rejected.

Skip to secondary menu
Skip to main content
Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Statistical Hypothesis Testing Overview

By Jim Frost 59 Comments

In this blog post, I explain why you need to use statistical hypothesis testing and help you navigate the essential terminology. Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables.

This post provides an overview of statistical hypothesis testing. If you need to perform hypothesis tests, consider getting my book, Hypothesis Testing: An Intuitive Guide .

Why You Should Perform Statistical Hypothesis Testing

Graph that displays mean drug scores by group. Use hypothesis testing to determine whether the difference between the means are statistically significant.

Hypothesis testing is a form of inferential statistics that allows us to draw conclusions about an entire population based on a representative sample. You gain tremendous benefits by working with a sample. In most cases, it is simply impossible to observe the entire population to understand its properties. The only alternative is to collect a random sample and then use statistics to analyze it.

While samples are much more practical and less expensive to work with, there are trade-offs. When you estimate the properties of a population from a sample, the sample statistics are unlikely to equal the actual population value exactly. For instance, your sample mean is unlikely to equal the population mean. The difference between the sample statistic and the population value is the sample error.

Differences that researchers observe in samples might be due to sampling error rather than representing a true effect at the population level. If sampling error causes the observed difference, the next time someone performs the same experiment the results might be different. Hypothesis testing incorporates estimates of the sampling error to help you make the correct decision. Learn more about Sampling Error .

For example, if you are studying the proportion of defects produced by two manufacturing methods, any difference you observe between the two sample proportions might be sample error rather than a true difference. If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics. That can be a costly mistake!

Let’s cover some basic hypothesis testing terms that you need to know.

Background information : Difference between Descriptive and Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Hypothesis Testing

Hypothesis testing is a statistical analysis that uses sample data to assess two mutually exclusive theories about the properties of a population. Statisticians call these theories the null hypothesis and the alternative hypothesis. A hypothesis test assesses your sample statistic and factors in an estimate of the sample error to determine which hypothesis the data support.

When you can reject the null hypothesis, the results are statistically significant, and your data support the theory that an effect exists at the population level.

The effect is the difference between the population value and the null hypothesis value. The effect is also known as population effect or the difference. For example, the mean difference between the health outcome for a treatment group and a control group is the effect.

Typically, you do not know the size of the actual effect. However, you can use a hypothesis test to help you determine whether an effect exists and to estimate its size. Hypothesis tests convert your sample effect into a test statistic, which it evaluates for statistical significance. Learn more about Test Statistics .

An effect can be statistically significant, but that doesn’t necessarily indicate that it is important in a real-world, practical sense. For more information, read my post about Statistical vs. Practical Significance .

Null Hypothesis

The null hypothesis is one of two mutually exclusive theories about the properties of the population in hypothesis testing. Typically, the null hypothesis states that there is no effect (i.e., the effect size equals zero). The null is often signified by H 0 .

In all hypothesis testing, the researchers are testing an effect of some sort. The effect can be the effectiveness of a new vaccination, the durability of a new product, the proportion of defect in a manufacturing process, and so on. There is some benefit or difference that the researchers hope to identify.

However, it’s possible that there is no effect or no difference between the experimental groups. In statistics, we call this lack of an effect the null hypothesis. Therefore, if you can reject the null, you can favor the alternative hypothesis, which states that the effect exists (doesn’t equal zero) at the population level.

You can think of the null as the default theory that requires sufficiently strong evidence against in order to reject it.

For example, in a 2-sample t-test, the null often states that the difference between the two means equals zero.

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Related post : Understanding the Null Hypothesis in More Detail

Alternative Hypothesis

The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect. If your sample contains sufficient evidence, you can reject the null and favor the alternative hypothesis. The alternative is often identified with H 1 or H A .

For example, in a 2-sample t-test, the alternative often states that the difference between the two means does not equal zero.

You can specify either a one- or two-tailed alternative hypothesis:

If you perform a two-tailed hypothesis test, the alternative states that the population parameter does not equal the null value. For example, when the alternative hypothesis is H A : μ ≠ 0, the test can detect differences both greater than and less than the null value.

A one-tailed alternative has more power to detect an effect but it can test for a difference in only one direction. For example, H A : μ > 0 can only test for differences that are greater than zero.

Related posts : Understanding T-tests and One-Tailed and Two-Tailed Hypothesis Tests Explained

Image of a P for the p-value in hypothesis testing.

P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null. You use P-values in conjunction with the significance level to determine whether your data favor the null or alternative hypothesis.

Related post : Interpreting P-values Correctly

Significance Level (Alpha)

image of the alpha symbol for hypothesis testing.

For instance, a significance level of 0.05 signifies a 5% risk of deciding that an effect exists when it does not exist.

Use p-values and significance levels together to help you determine which hypothesis the data support. If the p-value is less than your significance level, you can reject the null and conclude that the effect is statistically significant. In other words, the evidence in your sample is strong enough to be able to reject the null hypothesis at the population level.

Related posts : Graphical Approach to Significance Levels and P-values and Conceptual Approach to Understanding Significance Levels

Types of Errors in Hypothesis Testing

Statistical hypothesis tests are not 100% accurate because they use a random sample to draw conclusions about entire populations. There are two types of errors related to drawing an incorrect conclusion.

False positives: You reject a null that is true. Statisticians call this a Type I error . The Type I error rate equals your significance level or alpha (α).
False negatives: You fail to reject a null that is false. Statisticians call this a Type II error. Generally, you do not know the Type II error rate. However, it is a larger risk when you have a small sample size , noisy data, or a small effect size. The type II error rate is also known as beta (β).

Statistical power is the probability that a hypothesis test correctly infers that a sample effect exists in the population. In other words, the test correctly rejects a false null hypothesis. Consequently, power is inversely related to a Type II error. Power = 1 – β. Learn more about Power in Statistics .

Related posts : Types of Errors in Hypothesis Testing and Estimating a Good Sample Size for Your Study Using Power Analysis

Which Type of Hypothesis Test is Right for You?

There are many different types of procedures you can use. The correct choice depends on your research goals and the data you collect. Do you need to understand the mean or the differences between means? Or, perhaps you need to assess proportions. You can even use hypothesis testing to determine whether the relationships between variables are statistically significant.

To choose the proper statistical procedure, you’ll need to assess your study objectives and collect the correct type of data . This background research is necessary before you begin a study.

Related Post : Hypothesis Tests for Continuous, Binary, and Count Data

Statistical tests are crucial when you want to use sample data to make conclusions about a population because these tests account for sample error. Using significance levels and p-values to determine when to reject the null hypothesis improves the probability that you will draw the correct conclusion.

To see an alternative approach to these traditional hypothesis testing methods, learn about bootstrapping in statistics !

If you want to see examples of hypothesis testing in action, I recommend the following posts that I have written:

How Effective Are Flu Shots? This example shows how you can use statistics to test proportions.
Fatality Rates in Star Trek . This example shows how to use hypothesis testing with categorical data.
Busting Myths About the Battle of the Sexes . A fun example based on a Mythbusters episode that assess continuous data using several different tests.
Are Yawns Contagious? Another fun example inspired by a Mythbusters episode.

Reader Interactions

January 14, 2024 at 8:43 am

Hello professor Jim, how are you doing! Pls. What are the properties of a population and their examples? Thanks for your time and understanding.

January 14, 2024 at 12:57 pm

Please read my post about Populations vs. Samples for more information and examples.

Also, please note there is a search bar in the upper-right margin of my website. Use that to search for topics.

July 5, 2023 at 7:05 am

Hello, I have a question as I read your post. You say in p-values section

“P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is correct. In simpler terms, p-values tell you how strongly your sample data contradict the null. Lower p-values represent stronger evidence against the null.”

But according to your definition of effect, the null states that an effect does not exist, correct? So what I assume you want to say is that “P-values are the probability that you would obtain the effect observed in your sample, or larger, if the null hypothesis is **incorrect**.”

July 6, 2023 at 5:18 am

Hi Shrinivas,

The correct definition of p-value is that it is a probability that exists in the context of a true null hypothesis. So, the quotation is correct in stating “if the null hypothesis is correct.”

Essentially, the p-value tells you the likelihood of your observed results (or more extreme) if the null hypothesis is true. It gives you an idea of whether your results are surprising or unusual if there is no effect.

Hence, with sufficiently low p-values, you reject the null hypothesis because it’s telling you that your sample results were unlikely to have occurred if there was no effect in the population.

I hope that helps make it more clear. If not, let me know I’ll attempt to clarify!

May 8, 2023 at 12:47 am

Thanks a lot Ny best regards

May 7, 2023 at 11:15 pm

Hi Jim Can you tell me something about size effect? Thanks

May 8, 2023 at 12:29 am

Here’s a post that I’ve written about Effect Sizes that will hopefully tell you what you need to know. Please read that. Then, if you have any more specific questions about effect sizes, please post them there. Thanks!

January 7, 2023 at 4:19 pm

Hi Jim, I have only read two pages so far but I am really amazed because in few paragraphs you made me clearly understand the concepts of months of courses I received in biostatistics! Thanks so much for this work you have done it helps a lot!

January 10, 2023 at 3:25 pm

Thanks so much!

June 17, 2021 at 1:45 pm

Can you help in the following question: Rocinante36 is priced at ₹7 lakh and has been designed to deliver a mileage of 22 km/litre and a top speed of 140 km/hr. Formulate the null and alternative hypotheses for mileage and top speed to check whether the new models are performing as per the desired design specifications.

April 19, 2021 at 1:51 pm

Its indeed great to read your work statistics.

I have a doubt regarding the one sample t-test. So as per your book on hypothesis testing with reference to page no 45, you have mentioned the difference between “the sample mean and the hypothesised mean is statistically significant”. So as per my understanding it should be quoted like “the difference between the population mean and the hypothesised mean is statistically significant”. The catch here is the hypothesised mean represents the sample mean.

Please help me understand this.

Regards Rajat

April 19, 2021 at 3:46 pm

Thanks for buying my book. I’m so glad it’s been helpful!

The test is performed on the sample but the results apply to the population. Hence, if the difference between the sample mean (observed in your study) and the hypothesized mean is statistically significant, that suggests that population does not equal the hypothesized mean.

For one sample tests, the hypothesized mean is not the sample mean. It is a mean that you want to use for the test value. It usually represents a value that is important to your research. In other words, it’s a value that you pick for some theoretical/practical reasons. You pick it because you want to determine whether the population mean is different from that particular value.

I hope that helps!

November 5, 2020 at 6:24 am

Jim, you are such a magnificent statistician/economist/econometrician/data scientist etc whatever profession. Your work inspires and simplifies the lives of so many researchers around the world. I truly admire you and your work. I will buy a copy of each book you have on statistics or econometrics. Keep doing the good work. Remain ever blessed

November 6, 2020 at 9:47 pm

Hi Renatus,

Thanks so much for you very kind comments. You made my day!! I’m so glad that my website has been helpful. And, thanks so much for supporting my books! 🙂

November 2, 2020 at 9:32 pm

Hi Jim, I hope you are aware of 2019 American Statistical Association’s official statement on Statistical Significance: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 In case you do not bother reading the full article, may I quote you the core message here: “We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as “significantly different,” “p < 0.05,” and “nonsignificant” survive, whether expressed in words, by asterisks in a table, or in some other way."

With best wishes,

November 3, 2020 at 2:09 am

I’m definitely aware of the debate surrounding how to use p-values most effectively. However, I need to correct you on one point. The link you provide is NOT a statement by the American Statistical Association. It is an editorial by several authors.

There is considerable debate over this issue. There are problems with p-values. However, as the authors state themselves, much of the problem is over people’s mindsets about how to use p-values and their incorrect interpretations about what statistical significance does and does not mean.

If you were to read my website more thoroughly, you’d be aware that I share many of their concerns and I address them in multiple posts. One of the authors’ key points is the need to be thoughtful and conduct thoughtful research and analysis. I emphasize this aspect in multiple posts on this topic. I’ll ask you to read the following three because they all address some of the authors’ concerns and suggestions. But you might run across others to read as well.

Five Tips for Using P-values to Avoid Being Misled How to Interpret P-values Correctly P-values and the Reproducibility of Experimental Results

September 24, 2020 at 11:52 pm

HI Jim, i just want you to know that you made explanation for Statistics so simple! I should say lesser and fewer words that reduce the complexity. All the best! 🙂

September 25, 2020 at 1:03 am

Thanks, Rene! Your kind words mean a lot to me! I’m so glad it has been helpful!

September 23, 2020 at 2:21 am

Honestly, I never understood stats during my entire M.Ed course and was another nightmare for me. But how easily you have explained each concept, I have understood stats way beyond my imagination. Thank you so much for helping ignorant research scholars like us. Looking forward to get hardcopy of your book. Kindly tell is it available through flipkart?

September 24, 2020 at 11:14 pm

I’m so happy to hear that my website has been helpful!

I checked on flipkart and it appears like my books are not available there. I’m never exactly sure where they’re available due to the vagaries of different distribution channels. They are available on Amazon in India.

Introduction to Statistics: An Intuitive Guide (Amazon IN) Hypothesis Testing: An Intuitive Guide (Amazon IN)

July 26, 2020 at 11:57 am

Dear Jim I am a teacher from India . I don’t have any background in statistics, and still I should tell that in a single read I can follow your explanations . I take my entire biostatistics class for botany graduates with your explanations. Thanks a lot. May I know how I can avail your books in India

July 28, 2020 at 12:31 am

Right now my books are only available as ebooks from my website. However, soon I’ll have some exciting news about other ways to obtain it. Stay tuned! I’ll announce it on my email list. If you’re not already on it, you can sign up using the form that is in the right margin of my website.

June 22, 2020 at 2:02 pm

Also can you please let me if this book covers topics like EDA and principal component analysis?

June 22, 2020 at 2:07 pm

This book doesn’t cover principal components analysis. Although, I wouldn’t really classify that as a hypothesis test. In the future, I might write a multivariate analysis book that would cover this and others. But, that’s well down the road.

My Introduction to Statistics covers EDA. That’s the largely graphical look at your data that you often do prior to hypothesis testing. The Introduction book perfectly leads right into the Hypothesis Testing book.

June 22, 2020 at 1:45 pm

Thanks for the detailed explanation. It does clear my doubts. I saw that your book related to hypothesis testing has the topics that I am studying currently. I am looking forward to purchasing it.

Regards, Take Care

June 19, 2020 at 1:03 pm

For this particular article I did not understand a couple of statements and it would great if you could help: 1)”If sample error causes the observed difference, the next time someone performs the same experiment the results might be different.” 2)”If the difference does not exist at the population level, you won’t obtain the benefits that you expect based on the sample statistics.”

I discovered your articles by chance and now I keep coming back to read & understand statistical concepts. These articles are very informative & easy to digest. Thanks for the simplifying things.

June 20, 2020 at 9:53 pm

I’m so happy to hear that you’ve found my website to be helpful!

To answer your questions, keep in mind that a central tenant of inferential statistics is that the random sample that a study drew was only one of an infinite number of possible it could’ve drawn. Each random sample produces different results. Most results will cluster around the population value assuming they used good methodology. However, random sampling error always exists and makes it so that population estimates from a sample almost never exactly equal the correct population value.

So, imagine that we’re studying a medication and comparing the treatment and control groups. Suppose that the medicine is truly not effect and that the population difference between the treatment and control group is zero (i.e., no difference.) Despite the true difference being zero, most sample estimates will show some degree of either a positive or negative effect thanks to random sampling error. So, just because a study has an observed difference does not mean that a difference exists at the population level. So, on to your questions:

1. If the observed difference is just random error, then it makes sense that if you collected another random sample, the difference could change. It could change from negative to positive, positive to negative, more extreme, less extreme, etc. However, if the difference exists at the population level, most random samples drawn from the population will reflect that difference. If the medicine has an effect, most random samples will reflect that fact and not bounce around on both sides of zero as much.

2. This is closely related to the previous answer. If there is no difference at the population level, but say you approve the medicine because of the observed effects in a sample. Even though your random sample showed an effect (which was really random error), that effect doesn’t exist. So, when you start using it on a larger scale, people won’t benefit from the medicine. That’s why it’s important to separate out what is easily explained by random error versus what is not easily explained by it.

I think reading my post about how hypothesis tests work will help clarify this process. Also, in about 24 hours (as I write this), I’ll be releasing my new ebook about Hypothesis Testing!

May 29, 2020 at 5:23 am

Hi Jim, I really enjoy your blog. Can you please link me on your blog where you discuss about Subgroup analysis and how it is done? I need to use non parametric and parametric statistical methods for my work and also do subgroup analysis in order to identify potential groups of patients that may benefit more from using a treatment than other groups.

May 29, 2020 at 2:12 pm

Hi, I don’t have a specific article about subgroup analysis. However, subgroup analysis is just the dividing up of a larger sample into subgroups and then analyzing those subgroups separately. You can use the various analyses I write about on the subgroups.

Alternatively, you can include the subgroups in regression analysis as an indicator variable and include that variable as a main effect and an interaction effect to see how the relationships vary by subgroup without needing to subdivide your data. I write about that approach in my article about comparing regression lines . This approach is my preferred approach when possible.

April 19, 2020 at 7:58 am

sir is confidence interval is a part of estimation?

April 17, 2020 at 3:36 pm

Sir can u plz briefly explain alternatives of hypothesis testing? I m unable to find the answer

April 18, 2020 at 1:22 am

Assuming you want to draw conclusions about populations by using samples (i.e., inferential statistics ), you can use confidence intervals and bootstrap methods as alternatives to the traditional hypothesis testing methods.

March 9, 2020 at 10:01 pm

Hi JIm, could you please help with activities that can best teach concepts of hypothesis testing through simulation, Also, do you have any question set that would enhance students intuition why learning hypothesis testing as a topic in introductory statistics. Thanks.

March 5, 2020 at 3:48 pm

Hi Jim, I’m studying multiple hypothesis testing & was wondering if you had any material that would be relevant. I’m more trying to understand how testing multiple samples simultaneously affects your results & more on the Bonferroni Correction

March 5, 2020 at 4:05 pm

I write about multiple comparisons (aka post hoc tests) in the ANOVA context . I don’t talk about Bonferroni Corrections specifically but I cover related types of corrections. I’m not sure if that exactly addresses what you want to know but is probably the closest I have already written. I hope it helps!

January 14, 2020 at 9:03 pm

Thank you! Have a great day/evening.

January 13, 2020 at 7:10 pm

Any help would be greatly appreciated. What is the difference between The Hypothesis Test and The Statistical Test of Hypothesis?

January 14, 2020 at 11:02 am

They sound like the same thing to me. Unless this is specialized terminology for a particular field or the author was intending something specific, I’d guess they’re one and the same.

April 1, 2019 at 10:00 am

so these are the only two forms of Hypothesis used in statistical testing?

April 1, 2019 at 10:02 am

Are you referring to the null and alternative hypothesis? If so, yes, that’s those are the standard hypotheses in a statistical hypothesis test.

April 1, 2019 at 9:57 am

year very insightful post, thanks for the write up

October 27, 2018 at 11:09 pm

hi there, am upcoming statistician, out of all blogs that i have read, i have found this one more useful as long as my problem is concerned. thanks so much

October 27, 2018 at 11:14 pm

Hi Stano, you’re very welcome! Thanks for your kind words. They mean a lot! I’m happy to hear that my posts were able to help you. I’m sure you will be a fantastic statistician. Best of luck with your studies!

October 26, 2018 at 11:39 am

Dear Jim, thank you very much for your explanations! I have a question. Can I use t-test to compare two samples in case each of them have right bias?

October 26, 2018 at 12:00 pm

Hi Tetyana,

You’re very welcome!

The term “right bias” is not a standard term. Do you by chance mean right skewed distributions? In other words, if you plot the distribution for each group on a histogram they have longer right tails? These are not the symmetrical bell-shape curves of the normal distribution.

If that’s the case, yes you can as long as you exceed a specific sample size within each group. I include a table that contains these sample size requirements in my post about nonparametric vs parametric analyses .

Bias in statistics refers to cases where an estimate of a value is systematically higher or lower than the true value. If this is the case, you might be able to use t-tests, but you’d need to be sure to understand the nature of the bias so you would understand what the results are really indicating.

I hope this helps!

April 2, 2018 at 7:28 am

Simple and upto the point 👍 Thank you so much.

April 2, 2018 at 11:11 am

Hi Kalpana, thanks! And I’m glad it was helpful!

March 26, 2018 at 8:41 am

Am I correct if I say: Alpha – Probability of wrongly rejection of null hypothesis P-value – Probability of wrongly acceptance of null hypothesis

March 28, 2018 at 3:14 pm

You’re correct about alpha. Alpha is the probability of rejecting the null hypothesis when the null is true.

Unfortunately, your definition of the p-value is a bit off. The p-value has a fairly convoluted definition. It is the probability of obtaining the effect observed in a sample, or more extreme, if the null hypothesis is true. The p-value does NOT indicate the probability that either the null or alternative is true or false. Although, those are very common misinterpretations. To learn more, read my post about how to interpret p-values correctly .

March 2, 2018 at 6:10 pm

I recently started reading your blog and it is very helpful to understand each concept of statistical tests in easy way with some good examples. Also, I recommend to other people go through all these blogs which you posted. Specially for those people who have not statistical background and they are facing to many problems while studying statistical analysis.

Thank you for your such good blogs.

March 3, 2018 at 10:12 pm

Hi Amit, I’m so glad that my blog posts have been helpful for you! It means a lot to me that you took the time to write such a nice comment! Also, thanks for recommending by blog to others! I try really hard to write posts about statistics that are easy to understand.

January 17, 2018 at 7:03 am

I recently started reading your blog and I find it very interesting. I am learning statistics by my own, and I generally do many google search to understand the concepts. So this blog is quite helpful for me, as it have most of the content which I am looking for.

January 17, 2018 at 3:56 pm

Hi Shashank, thank you! And, I’m very glad to hear that my blog is helpful!

January 2, 2018 at 2:28 pm

thank u very much sir.

January 2, 2018 at 2:36 pm

You’re very welcome, Hiral!

November 21, 2017 at 12:43 pm

Thank u so much sir….your posts always helps me to be a #statistician

November 21, 2017 at 2:40 pm

Hi Sachin, you’re very welcome! I’m happy that you find my posts to be helpful!

November 19, 2017 at 8:22 pm

great post as usual, but it would be nice to see an example.

November 19, 2017 at 8:27 pm

Thank you! At the end of this post, I have links to four other posts that show examples of hypothesis tests in action. You’ll find what you’re looking for in those posts!

Comments and Questions Cancel reply

Hypothesis tests #

Formal hypothesis testing is perhaps the most prominent and widely-employed form of statistical analysis. It is sometimes seen as the most rigorous and definitive part of a statistical analysis, but it is also the source of many statistical controversies. The currently-prevalent approach to hypothesis testing dates to developments that took place between 1925 and 1940, especially the work of Ronald Fisher , Jerzy Neyman , and Egon Pearson .

In recent years, many prominent statisticians have argued that less emphasis should be placed on the formal hypothesis testing approaches developed in the early twentieth century, with a correspondingly greater emphasis on other forms of uncertainty analysis. Our goal here is to give an overview of some of the well-established and widely-used approaches for hypothesis testing. We will also provide some perspectives on how these tools can be effectively used, and discuss their limitations. We will also discuss some new approaches to hypothesis testing that may eventually come to be as prominent as these classical approaches.

A falsifiable hypothesis is a statement, or hypothesis, that can be contradicted with evidence. In empirical (data-driven) research, this evidence will always be obtained through the data. In statistical hypothesis testing, the hypothesis that we formally test is called the null hypothesis . The alternative hypothesis is a second hypothesis that is our proposed explanation for what happens if the null hypothesis is wrong.

Test statistics #

The key element of a statistical hypothesis test is the test statistic , which (like any statistic) is a function of the data. A test statistic takes our entire dataset, and reduces it to one number. This one number ideally should contain all the information in the data that is relevant for assessing the two hypotheses of interest, and exclude any aspects of the data that are irrelevant for assessing the two hypotheses. The test statistic measures evidence against the null hypothesis. Most test statistics are constructed so that a value of zero represents the lowest possible level of evidence against the null hypothesis. Test statistic values that deviate from zero represent greater levels of evidence against the null hypothesis. The larger the magnitude of the test statistic, the stronger the evidence against the null hypothesis.

A major theme of statistical research is to devise effective ways to construct test statistics. Many useful ways to do this have been devised, and there is no single approach that is always the best. In this introductory course, we will focus on tests that starting with an estimate of a quantity that is relevant for assessing the hypotheses, then proceed by standardizing this estimate by dividing it by its standard error. This approach is sometimes referred to as “Wald testing”, after Abraham Wald .

Testing the equality of two proportions #

As a basic example, let’s consider risk perception related to COVID-19. As you will see below, hypothesis testing can appear at first to be a fairly elaborate exercise. Using this example, we describe each aspect of this exercise in detail below.

The data and research question #

The data shown below are simulated but are designed to reflect actual surveys conducted in the United States in March of 2020. Partipants were asked whether they perceive that they have a substantial risk of dying if they are infected with the novel coronavirus. The number of people stating each response, stratified on age, are shown below (only two age groups are shown):

	High risk	Not high risk
Age < 30	25	202
Age 60-69	30	124

Each subject’s response is binary – they either perceive themselves to be high risk, or not to be at high risk. When working with this type of data, we are usually interested in the proportion of people who provide each response within each stratum (age group). These are conditional proportions, conditioning on the age group. The numerical values of the conditional proportions are given below:

	High risk	Not high risk
Age < 30	0.110	0.890
Age 60-69	0.195	0.805

There are four conditional proportions in the table above – the proportion of younger people who perceive themselves to be at higher risk, 0.110=25/(25+202); the proportion of younger people who do not perceive themselves to be at high risk, 0.890=202/(25+202); the proportion of older people who perceive themselves to be at high risk 0.195=30/(30+124); and the proportion of older people who do not perceive themselves to be at high risk, 0.805=124/(30+124).

The trend in the data is that younger people perceive themselves to be at lower risk of dying than older people, by a difference of 0.195-0.110=0.085 (in terms of proportions). But is this trend only present in this sample, or is it generalizable to a broader population (say the entire US population)? That is the goal of conducting a statistical hypothesis test in this setting.

The population structure #

Corresponding to our data above is the unobserved population structure, which we can denote as follows

	High risk	Not high risk
Age < 30	$p$	$1-p$
Age 60-69	$q$	$1-q$

The symbols $p$ and $q$ in the table above are population parameters . These are quantitites that we do not know, and wish to assess using the data. In this case, our null hypothesis can be expressed as the statement $p = q$ . We can estimate $p$ using the sample proportion $\hat{p} = 0.110$ , and similarly estimate $q$ using $\hat{q} = 0.195$ . However these estimates do not immediately provide us with a way of expressing the evidence relating to the hypothesis that $p=q$ . This is provided by the test statistic.

A test statistic #

As noted above, a test statistic is a reduction of the data to one number that captures all of the relevant information for assessing the hypotheses. A natural first choice for a test statistic here would be the difference in sample proportions between the two age groups, which is 0.195 - 0.110 = 0.085. There is a difference of 0.085 between the perceived risks of death in the younger and older age groups.

The difference in rates (0.085) does not on its own make a good test statistic, although it is a good start toward obtaining one. The reason for this is that the evidence underlying this difference in rates depends also on the absolute rates (0.110 and 0.195), and on the sample sizes (227 and 154). If we only know that the difference in rates is 0.085, this is not sufficient to evaluate the hypothesis in a statistical manner. A given difference in rates is much stronger evidence if it is obtained from a larger sample. If we have a difference of 0.085 with a very large sample, say one million people, then we should be almost certain that the true rates differ (i.e. the data are highly incompatiable with the hypothesis that $p=q$ ). If we have the same difference in rates of 0.085, but with a small sample, say 50 people per age group, then there would be almost no evidence for a true difference in the rates (i.e. the data are compatiable with the hypothesis $p=q$ ).

To address this issue, we need to consider the uncertainty in the estimated rate difference, which is 0.085. Recall that the estimated rate difference is obtained from the sample and therefore is almost certain to deviate somewhat from the true rate difference in the population (which is unknown). Recall from our study of standard errors that the standard error for an estimated proportion is $\sqrt{p(1-p)/n}$ , where $p$ is the outcome probability (here the outcome is that a person perceives a high risk of dying), and $n$ is the sample size.

In the present analysis, we are comparing two proportions, so we have two standard errors. The estimated standard error for the younger people is $\sqrt{0.11\cdot 0.89/227} \approx 0.021$ . The estimated standard error for the older people is $\sqrt{0.195\cdot 0.805/154} \approx 0.032$ . Note that both standard errors are estimated, rather than exact, because we are plugging in estimates of the rates (0.11 and 0.195). Also note that the standard error for the rate among older people is greater than that for younger people. This is because the sample size for older people is smaller, and also because the estimated rate for older people is closer to 1/2.

In our previous discussion of standard errors, we saw how standard errors for independent quantities $A$ and $B$ can be used to obtain the standard error for the difference $A-B$ . Applying that result here, we see that the standard error for the estimated difference in rates 0.195-0.11=0.085 is $\sqrt{0.021^2 + 0.032^2} \approx 0.038$ .

The final step in constructing our test statistic is to construct a Z-score from the estimated difference in rates. As with all Z-scores, we proceed by taking the estimated difference in rates, and then divide it by its standard error. Thus, we get a test statistic value of $0.085 / 0.038 \approx 2.24$ .

A test statistic value of 2.24 is not very close to zero, so there is some evidence against the null hypothesis. But the strength of this evidence remains unclear. Thus, we must consider how to calibrate this evidence in a way that makes it more interpretable.

Calibrating the evidence in the test statistic #

By the central limit theorem (CLT), a Z-score approximately follows a normal distribution. When the null hypothesis holds, the Z-score approximately follows the standard normal distribution (recall that a standard normal distribution is a normal distribution with expected value equal to 0 and variance equal to 1). If the null hypothesis does not hold, then the test statistic continues to approximately follow a normal distribution, but it is not the standard normal distribution.

A test statistic of zero represents the least possible evidence against the null hypothesis. Here, we will obtain a test statistic of zero when the two proportions being compared are identical, i.e. exactly the same proportions of younger and older people perceive a substantial risk of dying from a disease. Even if the test statistic is exactly zero, this does not guarantee that the null hypothesis is true. However it is the least amount of evidence that the data can present against the null hypothesis.

In a hypothesis testing setting using normally-distrbuted Z-scores, as is the case here (due to the CLT), the standard normal distribution is the reference distribution for our test statistic. If the Z-score falls in the center of the reference distribution, there is no evidence against the null hypothesis. If the Z-score falls into either tail of the reference distribution, then there is evidence against the null distribution, and the further into the tails of the reference distribution the Z-score falls, the greater the evidence.

The most conventional way to quantify the evidence in our test statistic is through a probability called the p-value . The p-value has a somewhat complex definition that many people find difficult to grasp. It is the probability of observing as much or more evidence against the null hypothesis as we actually observe, calculated when the null hypothesis is assumed to be true. We will discuss some ways to think about this more intuitively below.

For our purposes, “evidence against the null hypothesis” is reflected in how far into the tails of the reference distribution the Z-score (test statistic) falls. We observed a test statistic of 2.24 in our COVID risk perception analysis. Recall that due to the “empirical rule”, 95% of the time, a draw from a standard normal distribution falls between -2 and 2. Thus, the p-value must be less than 0.05, since 2.24 falls outside this interval. The p-value can be calculated using a computer, in this case it happens to be approximately 0.025.

As stated above, the p-value tells us how likely it would be for us to obtain as much evidence against the the null hypothesis as we observed in our actual data analysis, if we were certain that the null hypothesis were true. When the null hypothesis holds, any evidence against the null hypothesis is spurious. Thus, we will want to see stronger evidence against the null from our actual analysis than we would see if we know that the null hypothesis were true. A smaller p-value therefore reflects more evidence against the null hypothesis than a larger p-value.

By convention, p-values of 0.05 or smaller are considered to represent sufficiently strong evidence against the null hypothesis to make a finding “statistically significant”. This threshold of 0.05 was chosen arbitrarily 100 years ago, and there is no objective reason for it. In recent years, people have argued that either a lesser or a greater p-value threshold should be used. But largely due to convention, the practice of deeming p-values smaller than 0.05 to be statistically significant continues.

Summary of this example #

Here is a restatement of the above discussion, using slightly different language. In our analysis of COVID risk perceptions, we found a difference in proportions of 0.085 between younger and older subjects, with younger people perceiving a lower risk of dying. This is a difference based on the sample of data that we observed, but what we really want to know is whether there is a difference in COVID risk perception in the population (say, all US adults).

Suppose that in fact there is no difference in risk perception between younger and older people. For instance, suppose that in the population, 15% of people believe that they have a substantial risk of dying should they become infected with the novel coronavirus, regardless of their age. Even though the rates are equal in this imaginary population (both being 15%), the rates in our sample would typically not be equal. Around 3% of the time (0.024=2.4% to be exact), if the rates are actually equal in the population, we would see a test statistic that is 2.4 or larger. Since 3% represents a fairly rare event, we can conclude that our observed data are not compatible with the null hypothesis. We can also say that there is statistically significant evidence against the null hypothesis, and that we have “rejected” the null hypothesis at the 3% level.

In this data analysis, as in any data analysis, we cannot confirm definitively that the alternative hypothesis is true. But based on our data and the analysis performed above, we can claim that there is substantial evidence against the null hypothesis, using standard criteria for what is considered to be “substantial evidence”.

Comparison of means #

A very common setting where hypothesis testing is used arises when we wish to compare the means of a quantitative measurement obtained for two populations. Imagine, for example, that we have two ways of manufacturing a battery, and we wish to assess which approach yields batteries that are longer-lasting in actual use. To do this, suppose we obtain data that tells us the number of charge cycles that were completed in 200 batteries of type A, and in 300 batteries of type B. For the test developed below to be meaningful, the data must be independent and identically distributed samples.

The raw data for this study consists of 500 numbers, but it turns out that the most relevant information from the data is contained in the sample means and sample standard deviations computed within each battery type. Note that this is a huge reduction in complexity, since we started with 500 measurements and are able to summarize this down to just four numbers.

Suppose the summary statistics are as follows, where $\bar{x}$ , $\hat{\sigma}_x$ , and $n$ denote the sample mean, sample standard deviation, and sample size, respectively.

Type	$\bar{x}$	$\hat{\sigma}_x$	$n$
	420	70	200
	403	90	300

The simplest measure comparing the two manufacturing approaches is the difference 420 - 403 = 17. That is, batteries of type A tend to have 17 more charge cycles compared to batteries of type B. This difference is present in our sample, but is it also true that the entire population of type A batteries has more charge cycles than the entire population of type B batteries? That is the goal of conducting a hypothesis test.

The next step in the present analysis is to divide the mean difference, which is 17, by its standard error. As we have seen, the standard error of the mean, or SEM, is $\sigma/n$ , where $\sigma$ is the standard deviation and $n$ is the sample size. Since $\sigma$ is almost never known, we plug in its estimate $\hat{\sigma}$ . For the type A batteries, the estimated SEM is thus $70/\sqrt{200} \approx 4.95$ , and for the type B batteries the estimated SEM is $90/\sqrt{300} \approx 5.2$ .

Since we are comparing two estimated means that are obtained from independent samples, we can pool the standard deviations to obtain an overall standard deviation of $\sqrt{4.95^2 + 5.2^2} \approx 7.18$ . We can now obtain our test statistic $17/7.18 \approx 2.37$ .

The test statistic can be calibrated against a standard normal reference distribution. The probability of observing a standard normal value that is greater in magnitude than 2.37 is 0.018 (this can be obtained from a computer). This is the p-value, and since it is smaller than the conventional threshold of 0.05, we can claim that there is a statistically significant difference between the average number of charge cycles for the two types of batteries, with the A batteries having more charge cycles on average.

The analysis illustrated here is called a two independent samples Z-test , or just a two sample Z-test . It may be the most commonly employed of all statistical tests. It is also common to see the very similar two sample t-test , which is different only in that it uses the Student t distribution rather than the normal (Gaussian) distribution to calculate the p-values. In fact, there are quite a few minor variations on this testing framework, including “one sided” and “two sided” tests, and tests based on different ways of pooling the variance. Due to the CLT, if the sample size is modestly large (which is the case here), the results of all of these tests will be almost identical. For simplicity, we only cover the Z-test in this course.

Assessment of a correlation #

The tests for comparing proportions and means presented above are quite similar in many ways. To provide one more example of a hypothesis test that is somewhat different, we consider a test for a correlation coefficient.

Recall that the sample correlation coefficient $\hat{r}$ is used to assess the relationship, or association, between two quantities X and Y that are measured on the same units. For example, we may ask whether two biomarkers, serum creatinine and D-dimer, are correlated with each other. These biomarkers are both commonly used in medical settings and are obtained using blood tests. D-dimer is used to assess whether a person has blood clots, and serum creatinine is used to measure kidney performance.

Suppose we are interested in whether there is a correlation in the population between D-dimer and serum creatinine. The population correlation coefficient between these two quantitites can be denoted $r$ . Our null hypothesis is $r=0$ . Suppose that we observe a sample correlation coefficient of $\hat{r}=0.15$ , using an independent and identically distributed sample of pairs $(x, y)$ , where $x$ is a D-dimer measurement and $y$ is a serum creatinine measurement. Are these data consistent with the null hypothesis?

As above, we proceed by constructing a test statistic by taking the estimated statistic and dividing it by its standard error. The approximate standard error for $\hat{r}$ is $1/\sqrt{n}$ , where $n$ is the sample size. The test statistic is therefore $\sqrt{n}\cdot \hat{r} \approx 1.48$ .

We now calibrate this test statistic by comparing it to a standard normal reference distribution. Recall from the empirical rule that 5% of the time, a standard normal value falls outside the interval (-2, 2). Therefore, if the test statistic is smaller than 2 in magnitude, as is the case here, its p-value is greater than 0.05. Thus, in this case we know that the p-value will exceed 0.05 without calculating it, and therefore there is no basis for claiming that D-dimer and serum creatinine levels are correlated in this population.

Sampling properties of p-values #

A p-value is the most common way of calibrating evidence. Smaller p-values indicate stronger evidence against a null hypothesis. By convention, if the p-value is smaller than some threshold, usually 0.05, we reject the null hypothesis and declare a finding to be “statistically significant”. How can we understand more deeply what this means? One major concern should be obtaining a small p-value when the null hypothesis is true. If the null hypothesis is true, then it is incorrect to reject it. If we reject the null hypothesis, we are making a false claim. This can never be prevented with complete certainty, but we would like to have a very clear understanding of how likely it is to reject the null hypothesis when the null hypothesis is in fact true.

P-values have a special property that when the null distribution is true, the probability of observing a p-value smaller than 0.05 is 0.05 (5%). In fact, the probability of observing a p-value smaller than $t$ is equal to $t$ , for any threshold $t$ . For example, the probability of observing a p-value smaller than 0.1, when the null hypothesis is true, is 10%.

This fact gives a more concrete understanding of how strong the evidence is for a particular p-value. If we always reject the null hypothesis when the p-value is 0.1 or smaller, then over the long run we will reject the null hypothesis 10% of the time when the null hypothesis is true. If we always reject the null hypothesis when the p-value is 0.05 or smaller, then over the long run we will reject the null hypothesis 5% of the time when the null hypothesis is true.

The approach to hypothesis testing discussed above largely follows the framework developed by RA Fisher around 1925. Note that although we mentioned the alternative hypothesis above, we never actually used it. A more elaborate approach to hypothesis testing was developed somewhat later by Egon Pearson and Jerzy Neyman. The “Neyman-Pearson” approach to hypothesis testing is even more formal than Fisher’s approach, and is most suited to highly planned research efforts in which the study is carefully designed, then executed. While ideally all research projects should be carried out this way, in reality we often conduct research using data that are already available, rather than using data that are specifically collected to address the research question.

Neyman-Pearson hypothesis testing involves specifying an alternative hypothesis that we anticipate encountering. Usually this alternative hypothesis represents a realistic guess about what we might find once the data are collected. In each of the three examples above, imagine that the data are not yet collected, and we are asked to specify an alternative hypothesis. We may arrive at the following:

In comparing risk perceptions for COVID, we may anticipate that older people will perceive a 30% risk of dying, and younger people will anticipate a 5% risk of dying.

In comparing the number of charge cycles for two types of batteries, we may anticipate that batter type A will have on average 500 charge cycles, and battery type B will have on average 400 charge cycles.

In assessing the correlation between D-dimer and serum creatinine levels, we may anticipate a correlation of 0.3.

Note that none of the numbers stated here are data-driven – they are specified before any data are collected, so they do not match the results from the data, which were collected only later. These alternative hypotheses are all essentially speculations, based perhaps on related data or theoretical considerations.

There are several benefits of specifying an explicit alternative hypothesis, as done here, even though it is not strictly necessary and can be avoided entirely by adopting Fisher’s approach to hypothesis testing. One benefit of specifying an alternative hypothesis is that we can use it to assess the power of our planned study, which can in turn inform the design of the study, in particular the sample size. The power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. That is, it is the probability of discovering something real. The power should be contrasted with the level of a hypothesis test, which is the probability of rejecting the null hypothesis when the null hypothesis is true. That is, the level is the probability of “discovering” something that is not real.

To calculate the power, recall that for many of the test statistics that we are considering here, the test statistic has the form $\hat{\theta}/{\rm SE}(\hat{\theta})$ , where $\hat{\theta}$ is an estimate. For example, $\hat{\theta}$ ) may be the correlation coefficient between D-dimer and serum creatinine levels. As stated above, the power is the probability of rejecting the null hypothesis when the alternative hypothesis is true. Suppose we decide to reject the null hypothesis when the test statistic is greater than 2, which is approximately equivalent to rejecting the null hypothesis when the p-value is less than 0.05. The following calculation tells us how to obtain the power in this setting:

Under the alternative hypothesis, $\sqrt{n}(\hat{r} - r)$ approximately follows a standard normal distribution. Therefore, if $r$ and $n$ are given, we can easily use the computer to obtain the probability of observing a value greater than $2 - \sqrt{n}r$ . This gives us the power of the test. For example, if we anticipate $r=0.3$ and plan to collect data for $n=100$ observations, the power is 0.84. This is generally considered to be good power – if the true value of $r$ is in fact 0.3, we would reject the null hypothesis 84% of the time.

A study usually has poor power because it has too small of a sample size. Poorly powered studies can be very misleading, but since large sample sizes are expensive to collect, a lot of research is conducted using sample sizes that yield moderate or even low power. If a study has low power, it is unlikely to reject the null hypothesis even when the alternative hypothesis is true, but it remains possible to reject the null hypothesis when the null hypothesis is true (usually this probability is 5%). Therefore the most likely outcome of a poorly powered study may be an incorrectly rejected null hypothesis.

Introduction to Hypothesis Testing

A statistical hypothesis is an assumption about a population parameter .

For example, we may assume that the mean height of a male in the U.S. is 70 inches.

The assumption about the height is the statistical hypothesis and the true mean height of a male in the U.S. is the population parameter .

A hypothesis test is a formal statistical test we use to reject or fail to reject a statistical hypothesis.

The Two Types of Statistical Hypotheses

To test whether a statistical hypothesis about a population parameter is true, we obtain a random sample from the population and perform a hypothesis test on the sample data.

There are two types of statistical hypotheses:

The null hypothesis , denoted as H 0 , is the hypothesis that the sample data occurs purely from chance.

The alternative hypothesis , denoted as H 1 or H a , is the hypothesis that the sample data is influenced by some non-random cause.

Hypothesis Tests

A hypothesis test consists of five steps:

1. State the hypotheses.

State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false.

2. Determine a significance level to use for the hypothesis.

Decide on a significance level. Common choices are .01, .05, and .1.

3. Find the test statistic.

Find the test statistic and the corresponding p-value. Often we are analyzing a population mean or proportion and the general formula to find the test statistic is: (sample statistic – population parameter) / (standard deviation of statistic)

4. Reject or fail to reject the null hypothesis.

Using the test statistic or the p-value, determine if you can reject or fail to reject the null hypothesis based on the significance level.

The p-value tells us the strength of evidence in support of a null hypothesis. If the p-value is less than the significance level, we reject the null hypothesis.

5. Interpret the results.

Interpret the results of the hypothesis test in the context of the question being asked.

The Two Types of Decision Errors

There are two types of decision errors that one can make when doing a hypothesis test:

Type I error: You reject the null hypothesis when it is actually true. The probability of committing a Type I error is equal to the significance level, often called alpha , and denoted as α.

Type II error: You fail to reject the null hypothesis when it is actually false. The probability of committing a Type II error is called the Power of the test or Beta , denoted as β.

One-Tailed and Two-Tailed Tests

A statistical hypothesis can be one-tailed or two-tailed.

A one-tailed hypothesis involves making a “greater than” or “less than ” statement.

For example, suppose we assume the mean height of a male in the U.S. is greater than or equal to 70 inches. The null hypothesis would be H0: µ ≥ 70 inches and the alternative hypothesis would be Ha: µ < 70 inches.

A two-tailed hypothesis involves making an “equal to” or “not equal to” statement.

For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. The null hypothesis would be H0: µ = 70 inches and the alternative hypothesis would be Ha: µ ≠ 70 inches.

Note: The “equal” sign is always included in the null hypothesis, whether it is =, ≥, or ≤.

Related: What is a Directional Hypothesis?

Types of Hypothesis Tests

There are many different types of hypothesis tests you can perform depending on the type of data you’re working with and the goal of your analysis.

The following tutorials provide an explanation of the most common types of hypothesis tests:

Introduction to the One Sample t-test Introduction to the Two Sample t-test Introduction to the Paired Samples t-test Introduction to the One Proportion Z-Test Introduction to the Two Proportion Z-Test

Featured Posts

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Statistics for LIS with Open Source R

Chapter 11: fundamentals of hypothesis testing.

Hypothesis testing refers to the process of choosing between two hypothesis statements about a probability distribution based on observed data from the distribution. Hypothesis testing is a step-by-step methodology that allows you to make inferences about a population parameter by analyzing differences between the results observed (the sample statistic) and the results that can be expected if some underlying hypothesis is actually true.

The methodology behind hypothesis testing: 1. State the null hypothesis. 2. Select the distribution to use. 3. Determine the rejection and non-rejection regions. 4. Calculate the value of the test statistic. 5. Make a decision.

Step 1. State the null hypothesis In this step, you set up two statements to determine the validity of a statistical claim: a null hypothesis and an alternative hypothesis.

The null hypothesis is a statement containing a null, or zero, difference. It is the null hypothesis that undergoes the testing procedure, whether it is the original claim or not. The notation for the null hypothesis H 0 represents the status quo or what is assumed to be true. It always contains the equal sign.

The alternative statement must be true if the null hypothesis is false. An alternative hypothesis is represented as H 1 . It Is the opposite of the null and is what you wish to support. It also never contains the equal sign.

Step 2. Select the distribution to use You can select a sample or the entire population. In selecting the distribution, we must know the mean for the population or the sample.

Step 3. Determine the rejection and non-rejection regions In this step we calculate the significance level . T he significance level , also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Step 4. Determine the value of the test statistics The values of the test statistic separate the rejection and non-rejection regions. Rejection region: the set of values for the test statistic that leads to rejection of H 0 . Non-rejection region: the set of values not in the rejection region that leads to non-rejection of H 0 .

The P-Value: Another quantitative measure for reporting the result of a test of hypothesis is the p-value. It is also called the probability of chance in order to test. The lower the p-value the greater likelihood of obtaining the same result. And as a result, a low p-value is a good indication that the results are not due to random chance alone. P-value = the probability of obtaining a test statistic equal to or more extreme value than the observed value of H 0 . As a result H 0 will be true. We then compare the p-value with α: 1. If p-value < α, reject H 0 . 2. If p-value >= α, do not reject H 0 . 3. “If p-value is low, then H 0 must go.”

As mentioned in Chapter 8 , the logic of hypothesis testing is to reject the null hypothesis if the sample data are not consistent with the null hypothesis. Thus, one rejects the null hypothesis if the observed test statistic is more extreme in the direction of the alternative hypothesis than one can tolerate. Step 5. Make a decision Based on the result, you can determine if your study accepts or rejects the null hypothesis. However, when the results of a hypothesis test are reported in academic journal, it is common to find that the author provides only the test statistic and its p-value result in the conclusions drawn from the data.

Next, Chapter 12 , Correlation and Regression Previous, Chapter 10 , Confidence Interval Estimation

A Primer for Using Open Source R Software for Accessibility and Visualization

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
Duis aute irure dolor in reprehenderit in voluptate
Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6a.2 - steps for hypothesis tests, the logic of hypothesis testing section .

A hypothesis, in statistics, is a statement about a population parameter, where this statement typically is represented by some specific numerical value. In testing a hypothesis, we use a method where we gather data in an effort to gather evidence about the hypothesis.

How do we decide whether to reject the null hypothesis?

If the sample data are consistent with the null hypothesis, then we do not reject it.
If the sample data are inconsistent with the null hypothesis, but consistent with the alternative, then we reject the null hypothesis and conclude that the alternative hypothesis is true.

Six Steps for Hypothesis Tests Section

In hypothesis testing, there are certain steps one must follow. Below these are summarized into six such steps to conducting a test of a hypothesis.

Set up the hypotheses and check conditions : Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as $H_0 $, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is evidence to suggest otherwise. The second hypothesis is called the alternative, or research hypothesis, notated as $H_a $. The alternative hypothesis is a statement of a range of alternative values in which the parameter may fall. One must also check that any conditions (assumptions) needed to run the test have been satisfied e.g. normality of data, independence, and number of success and failure outcomes.
Decide on the significance level, $\alpha $: This value is used as a probability cutoff for making decisions about the null hypothesis. This alpha value represents the probability we are willing to place on our test for making an incorrect decision in regards to rejecting the null hypothesis. The most common $\alpha $ value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1 (10%).
Calculate the test statistic: Gather sample data and calculate a test statistic where the sample statistic is compared to the parameter value. The test statistic is calculated under the assumption the null hypothesis is true and incorporates a measure of standard error and assumptions (conditions) related to the sampling distribution.
Calculate probability value (p-value), or find the rejection region: A p-value is found by using the test statistic to calculate the probability of the sample data producing such a test statistic or one more extreme. The rejection region is found by using alpha to find a critical value; the rejection region is the area that is more extreme than the critical value. We discuss the p-value and rejection region in more detail in the next section.
Make a decision about the null hypothesis: In this step, we decide to either reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we do not make a decision where we will accept the null hypothesis.
State an overall conclusion : Once we have found the p-value or rejection region, and made a statistical decision about the null hypothesis (i.e. we will reject the null or fail to reject the null), we then want to summarize our results into an overall conclusion for our test.

We will follow these six steps for the remainder of this Lesson. In the future Lessons, the steps will be followed but may not be explained explicitly.

Step 1 is a very important step to set up correctly. If your hypotheses are incorrect, your conclusion will be incorrect. In this next section, we practice with Step 1 for the one sample situations.

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

1.
2.
3.
4.
5.
6.
7.
8.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as $H_{0}$. Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as $H_{1}$ or $H_{a}$. For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, $\alpha$ or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$. $\overline{x}$ is the sample mean, $\mu$ is the population mean, $\sigma$ is the population standard deviation and n is the size of the sample.
t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$. s is the sample standard deviation.
$\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}$. $O_{i}$ is the observed value and $E_{i}$ is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

One sample: z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$.
Two samples: z = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}$.

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

One sample: t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$.
Two samples: t = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}$.

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

$H_{0}$: The population parameter is ≤ some value

$H_{1}$: The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

$H_{0}$: The population parameter is ≥ some value

$H_{1}$: The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

$H_{0}$: the population parameter = some value

$H_{1}$: the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
Step 2: Set up the alternative hypothesis.
Step 3: Choose the correct significance level, $\alpha$, and find the critical value.
Step 4: Calculate the correct test statistic (z, t or $\chi$) and p-value.
Step 5: Compare the test statistic with the critical value or compare the p-value with $\alpha$ to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as $H_{0}$: $\mu$ = 100.

Step 2: The alternative hypothesis is given by $H_{1}$: $\mu$ > 100.

Step 3: As this is a one-tailed test, $\alpha$ = 100% - 95% = 5%. This can be used to determine the critical value.

1 - $\alpha$ = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$.

$\mu$ = 100, $\overline{x}$ = 112.5, n = 30, $\sigma$ = 15

z = $\frac{112.5-100}{\frac{15}{\sqrt{30}}}$ = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Probability and Statistics
Data Handling

Important Notes on Hypothesis Testing

Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
It involves the setting up of a null hypothesis and an alternate hypothesis.
There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. $H_{0}$: $\mu$ = 90, $H_{1}$: $\mu$ > 90 $\overline{x}$ = 110, $\mu$ = 90, n = 5, s = 18. $\alpha$ = 0.05 Using the t-distribution table, the critical value is 2.132 t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$ t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. $H_{0}$: $\mu$ = 80, $H_{1}$: $\mu$ ≠ 80 $\overline{x}$ = 88, $\mu$ = 80, n = 36, $\sigma$ = 10. $\alpha$ = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ z = $\frac{88-80}{\frac{10}{\sqrt{36}}}$ = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. $H_{0}$: $\mu$ = 90, $H_{1}$: $\mu$ < 90 $\overline{x}$ = 110, $\mu$ = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = $\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}$ t = $\frac{82-90}{\frac{18}{\sqrt{6}}}$ t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = $\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}$ and for two samples is z = $\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}$.

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide $\alpha$ by 2. This is done as there are two rejection regions in the curve.

Search Search Please fill out this field.

What Is Hypothesis Testing?

How It Works

4 Step Process

The bottom line.

Fundamental Analysis

Hypothesis Testing: 4 Steps and Example

Hypothesis testing, sometimes called significance testing, is an act in statistics whereby an analyst tests an assumption regarding a population parameter. The methodology employed by the analyst depends on the nature of the data used and the reason for the analysis.

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data. Such data may come from a larger population or a data-generating process. The word "population" will be used for both of these cases in the following descriptions.

Key Takeaways

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data.
The test provides evidence concerning the plausibility of the hypothesis, given the data.
Statistical analysts test a hypothesis by measuring and examining a random sample of the population being analyzed.
The four steps of hypothesis testing include stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

How Hypothesis Testing Works

In hypothesis testing, an analyst tests a statistical sample, intending to provide evidence on the plausibility of the null hypothesis. Statistical analysts measure and examine a random sample of the population being analyzed. All analysts use a random population sample to test two different hypotheses: the null hypothesis and the alternative hypothesis.

The null hypothesis is usually a hypothesis of equality between population parameters; e.g., a null hypothesis may state that the population mean return is equal to zero. The alternative hypothesis is effectively the opposite of a null hypothesis. Thus, they are mutually exclusive , and only one can be true. However, one of the two hypotheses will always be true.

The null hypothesis is a statement about a population parameter, such as the population mean, that is assumed to be true.

State the hypotheses.
Formulate an analysis plan, which outlines how the data will be evaluated.
Carry out the plan and analyze the sample data.
Analyze the results and either reject the null hypothesis, or state that the null hypothesis is plausible, given the data.

Example of Hypothesis Testing

If an individual wants to test that a penny has exactly a 50% chance of landing on heads, the null hypothesis would be that 50% is correct, and the alternative hypothesis would be that 50% is not correct. Mathematically, the null hypothesis is represented as Ho: P = 0.5. The alternative hypothesis is shown as "Ha" and is identical to the null hypothesis, except with the equal sign struck-through, meaning that it does not equal 50%.

A random sample of 100 coin flips is taken, and the null hypothesis is tested. If it is found that the 100 coin flips were distributed as 40 heads and 60 tails, the analyst would assume that a penny does not have a 50% chance of landing on heads and would reject the null hypothesis and accept the alternative hypothesis.

If there were 48 heads and 52 tails, then it is plausible that the coin could be fair and still produce such a result. In cases such as this where the null hypothesis is "accepted," the analyst states that the difference between the expected results (50 heads and 50 tails) and the observed results (48 heads and 52 tails) is "explainable by chance alone."

When Did Hypothesis Testing Begin?

Some statisticians attribute the first hypothesis tests to satirical writer John Arbuthnot in 1710, who studied male and female births in England after observing that in nearly every year, male births exceeded female births by a slight proportion. Arbuthnot calculated that the probability of this happening by chance was small, and therefore it was due to “divine providence.”

What are the Benefits of Hypothesis Testing?

Hypothesis testing helps assess the accuracy of new ideas or theories by testing them against data. This allows researchers to determine whether the evidence supports their hypothesis, helping to avoid false claims and conclusions. Hypothesis testing also provides a framework for decision-making based on data rather than personal opinions or biases. By relying on statistical analysis, hypothesis testing helps to reduce the effects of chance and confounding variables, providing a robust framework for making informed conclusions.

What are the Limitations of Hypothesis Testing?

Hypothesis testing relies exclusively on data and doesn’t provide a comprehensive understanding of the subject being studied. Additionally, the accuracy of the results depends on the quality of the available data and the statistical methods used. Inaccurate data or inappropriate hypothesis formulation may lead to incorrect conclusions or failed tests. Hypothesis testing can also lead to errors, such as analysts either accepting or rejecting a null hypothesis when they shouldn’t have. These errors may result in false conclusions or missed opportunities to identify significant patterns or relationships in the data.

Hypothesis testing refers to a statistical process that helps researchers determine the reliability of a study. By using a well-formulated hypothesis and set of statistical tests, individuals or businesses can make inferences about the population that they are studying and draw conclusions based on the data presented. All hypothesis testing methods have the same four-step process, which includes stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

Sage. " Introduction to Hypothesis Testing ," Page 4.

Elder Research. " Who Invented the Null Hypothesis? "

Formplus. " Hypothesis Testing: Definition, Uses, Limitations and Examples ."

Terms of Service
Editorial Policy
Privacy Policy

8-1. The Elements of Hypothesis Testing

A manufacturer of emergency equipment asserts that a respirator that it makes delivers pure air for 75 minutes on average. A government regulatory agency is charged with testing such claims, in this case to verify that the average time is not less than 75 minutes. To do so it would select a random sample of respirators, compute the mean time that they deliver pure air, and compare that mean to the asserted time 75 minutes.

8.1 The Elements of Hypothesis Testing

LEARNING OBJECTIVES

To understand the logical framework of tests of hypotheses.

To learn basic terminology connected with hypothesis testing.

To learn fundamental facts about hypothesis testing.

1. Types of Hypotheses

Hypothesis testing is a statistical procedure in which a choice is made between a null hypothesis and an alternative hypothesis based on information in a sample.

The end result of a hypotheses testing procedure is a choice of one of the following two possible conclusions:

The following two examples illustrate the latter two cases.

EXAMPLE 1. A publisher of college textbooks claims that the average price of all hardbound college textbooks is $127.50. A student group believes that the actual mean is higher and wishes to test their belief. State the relevant null and alternative hypotheses.

[ Solution ]

EXAMPLE 2. The recipe for a bakery item is designed to result in a product that contains 8 grams of fat per serving. The quality control department samples the product periodically to insure that the production process is working as designed. State the relevant null and alternative hypotheses.

The claim expressed with an equality is the null hypothesis . This is the same as always stating the null hypothesis in the least favorable light. So in the introductory example about the respirators, we stated the manufacturer’s claim as “the average is 75 minutes” instead of the perhaps more natural “the average is at least 75 minutes,” essentially reducing the presentation of the null hypothesis to its worst case.

The first step in hypothesis testing is to identify the null and alternative hypotheses .

2. The Logic of Hypothesis Testing
3. The Rejection Region

The critical value or critical values of a test of hypotheses are the number or numbers that determine the rejection region.

μ_\bar{X}=μ=8.0 , σ_\bar{X}=(σ∕\sqrt{n}) =(0.15/ \sqrt{5}) = 0.067.

Because the rejection regions are computed based on areas in tails of distributions, as shown in Figure 8.2, hypothesis tests are classified according to the form of the alternative hypothesis in the following way.

Each of the last two forms is also called a one-tailed test .

4. Two Types of Errors

There are four possible outcomes of hypothesis testing procedure , as shown in the following table:

The only way to simultaneously reduce the chances of making either kind of error is to increase the sample size.

5. Standardizing the Test Statistic

A standardized test statistic for a hypothesis test is the statistic that is formed by subtracting from the statistic of interest its mean and dividing by its standard deviation.

In every hypothesis test in this book the standardized test statistic will be governed by either the standard normal distribution or Student’s t -distribution . Information about rejection regions is summarized in the following tables:

Every instance of hypothesis testing discussed in this and the following two chapters will have a rejection region like one of the six forms tabulated in the tables above.

No matter what the context a test of hypotheses can always be performed by applying the following systematic procedure, which will be illustrated in the examples in the succeeding sections.

Systematic Hypothesis Testing Procedure: Critical Value Approach Identify the null and alternative hypotheses. Identify the relevant test statistic and its distribution. Compute from the data the value of the test statistic. Construct the rejection region. Compare the value computed in Step 3 to the rejection region constructed in Step 4 and make a decision. Formulate the decision in the context of the problem, if applicable.

The procedure that we have outlined in this section is called the “ Critical Value Approach ” to hypothesis testing to distinguish it from an alternative but equivalent approach that will be introduced at the end of Section 8.3 "The Observed Significance of a Test" .

Last updated 4 years ago

In the sampling that we have studied so far the goal has been to estimate a population parameter. But the sampling done by the government agency has a somewhat different objective, not so much to estimate the population mean μ μ μ as to test an assertion—or a hypothesis —about it, namely, whether it is as large as 75 or not. The agency is not necessarily interested in the actual value of μ , μ, μ , just whether it is as claimed. Their sampling is done to perform a test of hypotheses, the subject of this chapter.

The null hypothesis , denoted H 0 H_0 H 0 , is the statement about the population parameter that is assumed to be true unless there is convincing evidence to the contrary .

The alternative hypothesis , denoted H a H_a H a , is a statement about the population parameter that is contradictory to the null hypothesis, and is accepted as true only if there is convincing evidence in favor of it.

Reject H 0 H_0 H 0 (and therefore accept H a H_a H a ), or

Fail to reject H 0 H_0 H 0 (and therefore fail to accept H a H_a H a ).

The null hypothesis will always be an assertion containing an equals sign, but depending on the situation the alternative hypothesis can have any one of three forms : with the symbol “ < < < ,” as in the example just discussed, with the symbol “ > > > ,” or with the symbol “ ≠ ≠  = ” .

H 0 : μ = 127.50. H_0:μ=127.50. H 0 : μ = 127.50.

H a : μ > 127.50. H_a:μ>127.50. H a : μ > 127.50.

The default option is to accept the publisher’s claim unless there is compelling evidence to the contrary. Thus the null hypothesis is H 0 : μ = 127.50. H_0:μ=127.50. H 0 : μ = 127.50. Since the student group thinks that the average textbook price is greater than the publisher’s figure, the alternative hypothesis in this situation is H a : μ > 127.50. H_a:μ>127.50. H a : μ > 127.50.

H 0 : μ = 8. H_0:μ=8. H 0 : μ = 8.

H a : μ ≠ 8. H_a:μ\ne8. H a : μ  = 8.

The default option is to assume that the product contains the amount of fat it was formulated to contain unless there is compelling evidence to the contrary. Thus the null hypothesis is H 0 : μ = 8.0. H_0:μ=8.0. H 0 : μ = 8.0. Since to contain either more fat than desired or to contain less fat than desired are both an indication of a faulty production process, the alternative hypothesis in this situation is that the mean is different from 8.0, so H a : μ ≠ 8.0. H_a:μ≠8.0. H a : μ  = 8.0.

In "EXAMPLE 1", the textbook example, it might seem more natural that the publisher’s claim be that the average price is at most $127.50, not exactly $127.50. If the claim were made this way, then the null hypothesis would be H 0 : μ ≤ 127.50 H_0:μ≤127.50 H 0 : μ ≤ 127.50 , and the value $127.50 given in the example would be the one that is least favorable to the publisher’s claim, the null hypothesis. It is always true that if the null hypothesis is retained for its least favorable value, then it is retained for every other value.

The null hypothesis always has the form H 0 : μ = μ 0 H_0:μ=μ_0 H 0 : μ = μ 0 for a specific number μ 0 μ_0 μ 0 (in the respirator example μ 0 = 75 μ_0=75 μ 0 = 75 , in the textbook example μ 0 = 127.5 μ_0 = 127.5 μ 0 = 127.5 , and in the baked goods example μ 0 = 8.0 μ_0 = 8.0 μ 0 = 8.0 ). Since the null hypothesis is accepted unless there is strong evidence to the contrary, the test procedure is based on the initial assumption that μ 0 μ_0 μ 0 is true . This point is so important that we will repeat it in a display:

The test procedure is based on the initial assumption that μ 0 μ_0 μ 0 is true.

The criterion for judging between H 0 H_0 H 0 and H a H_a H a based on the sample data is: if the value of X ˉ \bar{X} X ˉ would be highly unlikely to occur if H 0 H_0 H 0 were true, but favors the truth of H a H_a H a , then we reject H 0 H_0 H 0 in favor of H a H_a H a . Otherwise we do not reject H 0 H_0 H 0 .

Supposing for now that X ˉ \bar{X} X ˉ follows a normal distribution, when the null hypothesis is true the density function for the sample mean X ˉ \bar{X} X ˉ must be as in Figure 8.1 "The Density Curve for " : a bell curve centered at μ 0 μ_0 μ 0 . Thus if H 0 H_0 H 0 is true then X ˉ \bar{X} X ˉ - is likely to take a value near μ 0 μ_0 μ 0 and is unlikely to take values far away. Our decision procedure therefore reduces simply to:

if H a H_a H a has the form H a : μ < μ 0 H_a:μ<μ_0 H a : μ < μ 0 then reject H 0 H_0 H 0 if x ˉ \bar{x} x ˉ is far to the left of μ 0 μ_0 μ 0 ;

if H a H_a H a has the form H a : μ > μ 0 H_a:μ>μ_0 H a : μ > μ 0 then reject H 0 H_0 H 0 if x ˉ \bar{x} x ˉ is far to the right of μ 0 μ_0 μ 0 ;

if H a H_a H a has the form H a : μ ≠ μ 0 H_a:μ \neμ_0 H a : μ  = μ 0 then reject H 0 H_0 H 0 if x ˉ \bar{x} x ˉ is far away from μ 0 μ_0 μ 0 in either direction.

Figure 8.1 The Density Curve for X ˉ \bar{X} X ˉ if H 0 H_0 H 0 Is True

Think of the respirator example, for which the null hypothesis is H 0 : μ = 75 H_0:μ=75 H 0 : μ = 75 , the claim that the average time air is delivered for all respirators is 75 minutes. If the sample mean is 75 or greater then we certainly would not reject H 0 H_0 H 0 (since there is no issue with an emergency respirator delivering air even longer than claimed).

If the sample mean is slightly less than 75 then we would logically attribute the difference to sampling error and also not reject H 0 H_0 H 0 either.

Values of the sample mean that are smaller and smaller are less and less likely to come from a population for which the population mean is 75. Thus if the sample mean is far less than 75, say around 60 minutes or less, then we would certainly reject H 0 H_0 H 0 , because we know that it is highly unlikely that the average of a sample would be so low if the population mean were 75. This is the rare event criterion for rejection: what we actually observed ( X ˉ < 60 \bar{X}<60 X ˉ < 60 ) would be so rare an event if μ = 75 μ = 75 μ = 75 were true that we regard it as much more likely that the alternative hypothesis μ < 75 μ <75 μ < 75 holds.

In summary, to decide between H 0 H_0 H 0 and H a H_a H a in this example we would select a “ rejection region ” of values sufficiently far to the left of 75, based on the rare event criterion, and reject H 0 H_0 H 0 if the sample mean X ˉ \bar{X} X ˉ lies in the rejection region, but not reject H 0 H_0 H 0 if it does not.

Each different form of the alternative hypothesis H a H_a H a has its own kind of rejection region :

if (as in the respirator example) H a H_a H a has the form H a : μ < μ 0 H_a:μ<μ_0 H a : μ < μ 0 , we reject H 0 H_0 H 0 if X ˉ \bar{X} X ˉ is far to the left of μ 0 μ_0 μ 0 , that is, to the left of some number C C C , so the rejection region has the form of an interval ( − ∞ , C ] (−∞,C] ( − ∞ , C ] ;

if (as in the textbook example) H a H_a H a has the form H a : μ > μ 0 H_a:μ>μ_0 H a : μ > μ 0 , we reject H 0 H_0 H 0 if X ˉ \bar{X} X ˉ is far to the right of μ 0 μ_0 μ 0 , that is, to the right of some number C C C , so the rejection region has the form of an interval [ C , ∞ ) [C,∞) [ C , ∞ ) ;

if (as in the baked good example) H a H_a H a has the form H a : μ ≠ μ 0 H_a:μ \neμ_0 H a : μ  = μ 0 , we reject H 0 H_0 H 0 if X ˉ \bar{X} X ˉ is far away from μ 0 μ_0 μ 0 in either direction, that is, either to the left of some number C C C or to the right of some other number C ′ C' C ′ , so the rejection region has the form of the union of two intervals ( − ∞ , C ] ∪ [ C ′ , ∞ ) (−∞,C]∪[C′,∞) ( − ∞ , C ] ∪ [ C ′ , ∞ ) .

The key issue in our line of reasoning is the question of how to determine the number C C C or numbers C C C and C ′ C' C ′ , called the critical value or critical values of the statistic, that determine the rejection region.

Suppose the rejection region is a single interval , so we need to select a single number C C C . Here is the procedure for doing so. We select a small probability, denoted α α α , say 1%, which we take as our definition of “rare event:” an event is “rare” if its probability of occurrence is less than α α α . (In all the examples and problems in this text the value of α α α will be given already.)

The probability that X ˉ \bar{X} X ˉ takes a value in an interval is the area under its density curve and above that interval, so as shown in Figure 8.2 (drawn under the assumption that H 0 is true, so that the curve centers at μ 0 μ_0 μ 0 ) the critical value C C C is the value of X ˉ \bar{X} X ˉ that cuts off a tail area α α α in the probability density curve of X ˉ \bar{X} X ˉ .

When the rejection region is in two pieces , that is, composed of two intervals , the total area above both of them must be α α α , so the area above each one is α / 2 α/2 α /2 , as also shown in Figure 8.2 .

The number α α α is the total area of a tail or a pair of tails.

EXAMPLE 3. In the context of "EXAMPLE 2", suppose that it is known that the population is normally distributed with standard deviation σ = 0.15 σ = 0.15 σ = 0.15 gram, and suppose that the test of hypotheses H 0 : μ = 8.0 H_0:μ=8.0 H 0 : μ = 8.0 versus H a : μ ≠ 8.0 H_a:μ≠8.0 H a : μ  = 8.0 will be performed with a sample of size 5. Construct the rejection region for the test for the choice α = 0.10 α=0.10 α = 0.10 . Explain the decision procedure and interpret it.

α = 0.10 α=0.10 α = 0.10 , ( α ∕ 2 ) = ( 0.10 ∕ 2 ) = 0.05 (α∕2)=(0.10∕2)=0.05 ( α ∕2 ) = ( 0.10∕2 ) = 0.05 , z 0.05 = 1.645 z_{0.05}=1.645 z 0.05 = 1.645

so C and C ′ are 1.645 standard deviations of X ˉ \bar{X} X ˉ to the right and left of its mean 8.0 C = \mu_0 - z_{\alpha/2}\sigma_\bar{X} = 8.0 − (1.645)(0.067) = 7.89 C' = \mu_0 + z_{\alpha/2}\sigma_\bar{X} =8.0 + (1.645)(0.067) = 8.11

If H a H_a H a has the form H a : μ ≠ μ 0 H_a:μ \neμ_0 H a : μ  = μ 0 the test is called a two-tailed test .

If H a H_a H a has the form H a : μ < μ 0 H_a:μ < μ_0 H a : μ < μ 0 the test is called a left-tailed test .

If H a H_a H a has the form H a : μ > μ 0 H_a:μ > μ_0 H a : μ > μ 0 the test is called a right-tailed test .

reject the null hypothesis H o H_o H o in favor of the alternative H a H_a H a presented, or

do not reject the null hypothesis H 0 H_0 H 0 in favor of the alternative H a H_a H a presented.

As the table shows, there are two ways to be right and two ways to be wrong. Typically to reject H 0 H_0 H 0 when it is actually true is a more serious error than to fail to reject it when it is false, so the former error is labeled “Type I” and the latter error “Type II.”

In a test of hypotheses, a Type I error is the decision to reject H 0 H_0 H 0 when it is in fact true. A Type II error is the decision not to reject H 0 H_0 H 0 when it is in fact not true.

Unless we perform a census we do not have certain knowledge, so we do not know whether our decision matches the true state of nature or if we have made an error. We reject H 0 H_0 H 0 if what we observe would be a “rare” event if H 0 H_0 H 0 were true. But rare events are not impossible: they occur with probability α α α . Thus when H 0 H_0 H 0 is true, a rare event will be observed in the proportion α α α of repeated similar tests, and H 0 H_0 H 0 will be erroneously rejected in those tests. Thus α α α is the probability that in following the testing procedure to decide between H 0 H_0 H 0 and H a H_a H a we will make a Type I error .

The number α α α that is used to determine the rejection region is called the level of significance of the test . It is the probability that the test procedure will result in a Type I error.

The probability of making a Type II error is too complicated to discuss in a beginning text, so we will say no more about it than this: for a fixed sample size, choosing α α α smaller in order to reduce the chance of making a Type I error has the effect of increasing the chance of making a Type II error.

For example, reviewing Note 8.14 "Example 3" , if instead of working with the sample mean X ˉ \bar{X} X ˉ we instead work with the test statistic X ˉ − 8.0 0.067 . \frac{\bar{X}−8.0} {0.067} . 0.067 X ˉ − 8.0 .

then the distribution involved is standard normal and the critical values are just ± z 0.05 ±z_{0.05} ± z 0.05 . The extra work that was done to find that C = 7.89 C = 7.89 C = 7.89 and C ′ = 8.11 C′=8.11 C ′ = 8.11 is eliminated.

True State of Nature

Our Decision

Correct decision

Correct Decision

When the test statistic has the standard normal distribution:

Symbol in Ha

Terminology

Rejection Region

Left-tailed test

Right-tailed test

Two-tailed test

When the test statistic has Student’s t -distribution:

Do not reject

Type II error ( )

Type I error ( )

> Machine Learning
> Statistics

What is Hypothesis Testing? Types and Methods

Soumyaa Rawat
Jul 23, 2021

Hypothesis Testing

Hypothesis testing is the act of testing a hypothesis or a supposition in relation to a statistical parameter. Analysts implement hypothesis testing in order to test if a hypothesis is plausible or not.

In data science and statistics , hypothesis testing is an important step as it involves the verification of an assumption that could help develop a statistical parameter. For instance, a researcher establishes a hypothesis assuming that the average of all odd numbers is an even number.

In order to find the plausibility of this hypothesis, the researcher will have to test the hypothesis using hypothesis testing methods. Unlike a hypothesis that is ‘supposed’ to stand true on the basis of little or no evidence, hypothesis testing is required to have plausible evidence in order to establish that a statistical hypothesis is true.

Perhaps this is where statistics play an important role. A number of components are involved in this process. But before understanding the process involved in hypothesis testing in research methodology, we shall first understand the types of hypotheses that are involved in the process. Let us get started!

Types of Hypotheses

In data sampling, different types of hypothesis are involved in finding whether the tested samples test positive for a hypothesis or not. In this segment, we shall discover the different types of hypotheses and understand the role they play in hypothesis testing.

Alternative Hypothesis

Alternative Hypothesis (H1) or the research hypothesis states that there is a relationship between two variables (where one variable affects the other). The alternative hypothesis is the main driving force for hypothesis testing.

It implies that the two variables are related to each other and the relationship that exists between them is not due to chance or coincidence.

When the process of hypothesis testing is carried out, the alternative hypothesis is the main subject of the testing process. The analyst intends to test the alternative hypothesis and verifies its plausibility.

Null Hypothesis

The Null Hypothesis (H0) aims to nullify the alternative hypothesis by implying that there exists no relation between two variables in statistics. It states that the effect of one variable on the other is solely due to chance and no empirical cause lies behind it.

The null hypothesis is established alongside the alternative hypothesis and is recognized as important as the latter. In hypothesis testing, the null hypothesis has a major role to play as it influences the testing against the alternative hypothesis.

(Must read: What is ANOVA test? )

Non-Directional Hypothesis

The Non-directional hypothesis states that the relation between two variables has no direction.

Simply put, it asserts that there exists a relation between two variables, but does not recognize the direction of effect, whether variable A affects variable B or vice versa.

Directional Hypothesis

The Directional hypothesis, on the other hand, asserts the direction of effect of the relationship that exists between two variables.

Herein, the hypothesis clearly states that variable A affects variable B, or vice versa.

Statistical Hypothesis

A statistical hypothesis is a hypothesis that can be verified to be plausible on the basis of statistics.

By using data sampling and statistical knowledge, one can determine the plausibility of a statistical hypothesis and find out if it stands true or not.

(Related blog: z-test vs t-test )

Performing Hypothesis Testing

Now that we have understood the types of hypotheses and the role they play in hypothesis testing, let us now move on to understand the process in a better manner.

In hypothesis testing, a researcher is first required to establish two hypotheses - alternative hypothesis and null hypothesis in order to begin with the procedure.

To establish these two hypotheses, one is required to study data samples, find a plausible pattern among the samples, and pen down a statistical hypothesis that they wish to test.

A random population of samples can be drawn, to begin with hypothesis testing. Among the two hypotheses, alternative and null, only one can be verified to be true. Perhaps the presence of both hypotheses is required to make the process successful.

At the end of the hypothesis testing procedure, either of the hypotheses will be rejected and the other one will be supported. Even though one of the two hypotheses turns out to be true, no hypothesis can ever be verified 100%.

(Read also: Types of data sampling techniques )

Therefore, a hypothesis can only be supported based on the statistical samples and verified data. Here is a step-by-step guide for hypothesis testing.

Establish the hypotheses

First things first, one is required to establish two hypotheses - alternative and null, that will set the foundation for hypothesis testing.

These hypotheses initiate the testing process that involves the researcher working on data samples in order to either support the alternative hypothesis or the null hypothesis.

Generate a testing plan

Once the hypotheses have been formulated, it is now time to generate a testing plan. A testing plan or an analysis plan involves the accumulation of data samples, determining which statistic is to be considered and laying out the sample size.

All these factors are very important while one is working on hypothesis testing.

Analyze data samples

As soon as a testing plan is ready, it is time to move on to the analysis part. Analysis of data samples involves configuring statistical values of samples, drawing them together, and deriving a pattern out of these samples.

While analyzing the data samples, a researcher needs to determine a set of things -

Significance Level - The level of significance in hypothesis testing indicates if a statistical result could have significance if the null hypothesis stands to be true.

Testing Method - The testing method involves a type of sampling-distribution and a test statistic that leads to hypothesis testing. There are a number of testing methods that can assist in the analysis of data samples.

Test statistic - Test statistic is a numerical summary of a data set that can be used to perform hypothesis testing.

P-value - The P-value interpretation is the probability of finding a sample statistic to be as extreme as the test statistic, indicating the plausibility of the null hypothesis.

Infer the results

The analysis of data samples leads to the inference of results that establishes whether the alternative hypothesis stands true or not. When the P-value is less than the significance level, the null hypothesis is rejected and the alternative hypothesis turns out to be plausible.

Methods of Hypothesis Testing

As we have already looked into different aspects of hypothesis testing, we shall now look into the different methods of hypothesis testing. All in all, there are 2 most common types of hypothesis testing methods. They are as follows -

Frequentist Hypothesis Testing

The frequentist hypothesis or the traditional approach to hypothesis testing is a hypothesis testing method that aims on making assumptions by considering current data.

The supposed truths and assumptions are based on the current data and a set of 2 hypotheses are formulated. A very popular subtype of the frequentist approach is the Null Hypothesis Significance Testing (NHST).

The NHST approach (involving the null and alternative hypothesis) has been one of the most sought-after methods of hypothesis testing in the field of statistics ever since its inception in the mid-1950s.

Bayesian Hypothesis Testing

A much unconventional and modern method of hypothesis testing, the Bayesian Hypothesis Testing claims to test a particular hypothesis in accordance with the past data samples, known as prior probability, and current data that lead to the plausibility of a hypothesis.

The result obtained indicates the posterior probability of the hypothesis. In this method, the researcher relies on ‘prior probability and posterior probability’ to conduct hypothesis testing on hand.

On the basis of this prior probability, the Bayesian approach tests a hypothesis to be true or false. The Bayes factor, a major component of this method, indicates the likelihood ratio among the null hypothesis and the alternative hypothesis.

The Bayes factor is the indicator of the plausibility of either of the two hypotheses that are established for hypothesis testing.

(Also read - Introduction to Bayesian Statistics )

To conclude, hypothesis testing, a way to verify the plausibility of a supposed assumption can be done through different methods - the Bayesian approach or the Frequentist approach.

Although the Bayesian approach relies on the prior probability of data samples, the frequentist approach assumes without a probability. A number of elements involved in hypothesis testing are - significance level, p-level, test statistic, and method of hypothesis testing.

(Also read: Introduction to probability distributions )

A significant way to determine whether a hypothesis stands true or not is to verify the data samples and identify the plausible hypothesis among the null hypothesis and alternative hypothesis.

Share Blog :

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

An Overview of Descriptive Analysis

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

dataanalysisclassroom

making data analysis easy

Lesson 85 – The elements of hypothesis testing

Joe and Devine are having a casual conversation in a coffee shop. While Devine orders his usual espresso, Joe orders a new item on the card, the memory booster mocha .

Devine : Do you think this “memory booster” will work?

Joe : Apparently, their success rate is 75%, and they advertise that they are better than chance.

Devine : Hmm. So we can consider you as a subject of the experiment and test their claim.

Joe : Like a hypothesis test?

Devine : Yes. We can collect the data from all the subjects who participated in the test and verify, statistically, if their 75% is sufficiently different from the random chance of 50%.

Joe : Is there a formal way to prove this?

Devine : Once we collect the data, we can compute the probability of the data under the assumption of a null hypothesis, and if this probability is less than a certain threshold, we can say with some confidence that the data is incompatible with the null hypothesis. We can reject the null hypothesis. You must be familiar with “proof by contradiction” from your classes in logic.

Joe : Null hypothesis? How do we establish that? Moreover, there will be a lot of sampling variability, and depending on what sample we get, the results may be different. How can there be a complete contradiction?

Devine : That is a good point, Joe. It is possible to get samples that will show no memory-boosting effect, in which case, we cannot contradict. Since we are basing our decisions on the probability calculated from the sample data given a null hypothesis, we should say we are proving by low-probability . It is possible that we err on our decision 😉

Joe : There seem to be several concepts here that I may have to understand carefully. Can we dissect them and take it sip-by-sip!

Devine : Absolutely. Let’s go over the essential elements of hypothesis tests today and then, in the following weeks, we can dig deeper. I will introduce you to some new terms today, but we will learn about their details in later lessons. The hypothesis testing concepts are vast. While we may only look at the surface, we will emphasize the philosophical underpinnings that will give you the required arsenal to go forward.

Joe : 😎 😎 😎

Devine : Let’s start with a simple classification of the various types of hypothesis tests; one-sample tests and two or more sample tests .

A one-sample hypothesis is a statement about the parameter of the population; or, it is a statement about the probability distribution of a random variable.

Our discussion today is on whether or not a certain proportion of subjects taking the memory-boosting mocha improve their memory. The test is to see if this proportion is significantly different from 50%. We are verifying whether the parameter (proportion, p ) is equal to or different from 50%. So it is a one-sample hypothesis test .

The value that we compare the parameter on can be based on experience or knowledge of the process, based on some theory, or based on some design considerations or obligations. If it is based on experience or prior knowledge of the process, then we are verifying whether or not the parameter has changed. If it is based on some theory , then we are testing the theory. Our coffee example will fall under this criterion. We know that random chance means a 50% probability of improving (or not) the memory. So we test the proportion against this model; p = 0.5. If the parameter is compared against a value based on some design consideration or obligation, then we are testing for compliance.

Sometimes, we have to test one sample against another sample. For example, people who take the memory-boosting test from New York City may be compared with people taking the test from San Fransisco. This type of test is a two or multiple sample hypothesis test where we determine whether a random variable differs in its parameter among the two or more groups.

Joe : So, that is one-sample tests or two-sample tests.

Devine : Yes. Now, for any of these two types, we can further classify them into parametric tests or nonparametric tests .

If we assume that the data has a particular probability distribution, the test can be developed based on this probability distribution. These are called parametric tests .

If a probability distribution is appropriate for the data, then, the information contained in the data can be summarized using the parameters of this distribution; like the mean, standard deviation, proportion, etc. The hypothesis test can be designed using these parameters. The entire process becomes very efficient since we already know the mathematical formulations. In our case, since we are testing for proportion, we can assume a binomial distribution to derive the probabilities.

Joe : What if the data does not follow the distribution that we assume?

Devine : This is possible. If we make incorrect assumptions regarding the probability distributions, the parameters that we use to summarize the data are at best, a poor representation of the data, which will result in incorrect conclusions.

Joe : So I believe the nonparametric tests are an alternative to this.

Devine : That is correct. There are hypothesis tests that do not require the assumption that the data follow a particular probability distribution. Do you recall the bootstrap where we used the data to approximate the probability distribution function of the population?

Joe : Yes, I remember that. We did not have to make any assumption for deriving the confidence intervals.

Devine : Exactly. These type of tests are called nonparametric hypothesis tests . Information is efficiently extracted from the data without summarizing them into their statistics or parameters.

Here, I prepared a simple chart to show these classifications.

Joe : Is there a systematic process for the hypothesis test? Are there steps that I can follow?

Devine : Of course. We can follow these five steps for any hypothesis test. Let’s use our memory-booster test as a case in point as we elaborate on these steps.

$\alpha$

Joe : Awesome. We discussed the choice of the test — one-sample or two-sample; parametric vs. nonparametric. The choice between parametric or nonparametric test should be based on the expected distribution of the data.

Devine : Yes, if we are comfortable with the assumption of a probability distribution for the data, a parametric test may be used. If there is little information about the prior process, then it is beneficial to use the nonparametric tests. Nonparametric tests are also especially appropriate for small data sets.

As I already told you, we can assume a binomial distribution for the data on the number of people showing signs of improvement after taking the memory-boosting mocha.

Suppose ten people take the test, the probabilities can be derived from a binomial distribution with n = 10 and p = 0.5. The null distribution , i.e., what may happen by chance is a binomial distribution with n = 10 and p = 0.5, and we can check how far out on this distribution is our observed proportion.

Joe : What about the alternate hypothesis ?

Devine : If the null hypothesis is that the memory-booster has no impact, we would expect, on average, a 50% probability of success, i.e., around 5 out of 10 people will see the effect purely by chance. Now, the coffee shop claims that their new product is effective, beyond random possibility. We call this claim the alternate hypothesis.

$H_{0}$

is identified with the hypothesis of no change from the current belief.

The alternate hypothesis can be of two types, the one-sided alternative or the two-sided alternative .

Our test is a one-sided alternative hypothesis test. The proportion of people who would benefit from the memory-booster coffee is greater than the proportion who would claim benefit randomly.

It is usually the case that the null hypothesis is the favored claim. The onus of proof is on the alternative, i.e., we will continue to believe in , the status quo unless the experimental evidence strongly contradicts it; proof by low-probability.

Devine : Think about the possible outcomes of your hypothesis test.

Joe : We will either reject the null hypothesis or accept the null hypothesis.

Devine : Right. Let’s say we either reject the null hypothesis or fail to reject the null hypothesis if the data is inconclusive. Now, would your decision always be correct?

Joe : Not necessary??

$\alpha = 5\%$

A 5% rejection rate implies that we are rejecting the null hypothesis 5% of the times when in fact is true.

$\alpha = 1\%$

Joe : I think I understand. But some things are still not evident.

Devine : Don’t worry. We will get to the bottom of it as we do more and more hypothesis tests. There is another kind of error, the second type, Type II . It is the probability of not rejecting the null hypothesis when it is false. For example, suppose the coffee does boost the memory, but a sample of people did not show that effect, we would fail to reject the null hypothesis. In this case, we would have committed a Type II error.

Type II error is also called the lack of power in the test.

Some attention to these two Types shows that Type I and Type II errors are inversely related.

Joe : 😐 😐 😐

Devine : I promise. These things will be evident as we discuss more. Let me show all these possibilities in a table.

Joe : Two more steps. What are the test statistic and the p-value ?

$P(X \ge 9)$

The p-value is the probability of obtaining the computed test statistics under the null hypothesis. It is the evidence or lack thereof against the null hypothesis. The smaller the p-value, the less likely the observed statistic under the null hypothesis – and stronger evidence of rejecting the null.

Devine : Excellent. What we went through now is the procedure for any hypothesis test. Over the next few weeks, we will undertake several examples that will need a step-by-step hypothesis test to understand the evidence and make decisions. We will also learn the concepts of Type I and Type II errors at length. Till then, here is a summary of the steps.

And remember,

The null hypothesis is never “accepted,” or proven to be true. It is assumed to be true until proven otherwise and is “not rejected” when there is insufficient evidence to do so.

If you find this useful, please like, share and subscribe. You can also follow me on Twitter @realDevineni for updates on new lessons.

One thought on “Lesson 85 – The elements of hypothesis testing”

It is very good for this chapter to explain about the concept of statistical hypothesis testing , thank you so much

Comments are closed.

Enjoy this blog? Please spread the word :)

Data Science
Data Analysis
Data Visualization
Machine Learning
Deep Learning
Computer Vision
Artificial Intelligence
AI ML DS Interview Series
AI ML DS Projects series
Data Engineering
Web Scrapping
Data Analysis with Python

Introduction to Data Analysis

What is Data Analysis?
Data Analytics and its type
How to Install Numpy on Windows?
How to Install Pandas in Python?
How to Install Matplotlib on python?
How to Install Python Tensorflow in Windows?

Data Analysis Libraries

Pandas Tutorial
NumPy Tutorial - Python Library
Data Analysis with SciPy
Introduction to TensorFlow

Data Visulization Libraries

Matplotlib Tutorial
Python Seaborn Tutorial
Plotly tutorial
Introduction to Bokeh in Python

Exploratory Data Analysis (EDA)

Univariate, Bivariate and Multivariate data and its analysis
Measures of Central Tendency in Statistics
Measures of spread - Range, Variance, and Standard Deviation
Interquartile Range and Quartile Deviation using NumPy and SciPy
Anova Formula
Skewness of Statistical Data
How to Calculate Skewness and Kurtosis in Python?
Difference Between Skewness and Kurtosis
Histogram | Meaning, Example, Types and Steps to Draw
Interpretations of Histogram
Quantile Quantile plots
What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?
Using pandas crosstab to create a bar plot
Exploring Correlation in Python
Covariance and Correlation
Factor Analysis | Data Analysis
Data Mining - Cluster Analysis
MANOVA Test in R Programming
Python - Central Limit Theorem
Probability Distribution Function
Probability Density Estimation & Maximum Likelihood Estimation
Exponential Distribution in R Programming - dexp(), pexp(), qexp(), and rexp() Functions
Mathematics | Probability Distributions Set 4 (Binomial Distribution)
Poisson Distribution | Definition, Formula, Table and Examples
P-Value: Comprehensive Guide to Understand, Apply, and Interpret
Z-Score in Statistics
How to Calculate Point Estimates in R?
Confidence Interval
Chi-square test in Machine Learning

Understanding Hypothesis Testing

Data preprocessing.

ML | Data Preprocessing in Python
ML | Overview of Data Cleaning
ML | Handling Missing Values
Detect and Remove the Outliers using Python

Data Transformation

Data Normalization Machine Learning
Sampling distribution Using Python

Time Series Data Analysis

Data Mining - Time-Series, Symbolic and Biological Sequences Data
Basic DateTime Operations in Python
Time Series Analysis & Visualization in Python
How to deal with missing values in a Timeseries in Python?
How to calculate MOVING AVERAGE in a Pandas DataFrame?
What is a trend in time series?
How to Perform an Augmented Dickey-Fuller Test in R
AutoCorrelation

Case Studies and Projects

Top 8 Free Dataset Sources to Use for Data Science Projects
Step by Step Predictive Analysis - Machine Learning
6 Tips for Creating Effective Data Visualizations

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.

To test the validity of the claim or assumption about the population parameter:

A sample is drawn from the population and analyzed.
The results of the analysis are used to decide whether the claim is true or not.

Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

Null hypothesis (H 0 ): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example : A company’s mean production is 50 units/per da H 0 : [Tex]\mu [/Tex] = 50.
Alternative hypothesis (H 1 ): The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis. Example: A company’s production is not equal to 50 units/per day i.e. H 1 : [Tex]\mu [/Tex] [Tex]\ne [/Tex] 50.

Key Terms of Hypothesis Testing

Level of significance : It refers to the degree of significance in which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting a hypothesis, so we, therefore, select a level of significance that is usually 5%. This is normally denoted with [Tex]\alpha[/Tex] and generally, it is 0.05 or 5%, which means your output should be 95% confident to give a similar kind of result in each sample.
P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true parameter value is less than the null hypothesis. Example: H 0 : [Tex]\mu \geq 50 [/Tex] and H 1 : [Tex]\mu < 50 [/Tex]
Right-Tailed (Right-Sided) Test : The alternative hypothesis asserts that the true parameter value is greater than the null hypothesis. Example: H 0 : [Tex]\mu \leq50 [/Tex] and H 1 : [Tex]\mu > 50 [/Tex]

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]

To delve deeper into differences into both types of test: Refer to link

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha( [Tex]\alpha [/Tex] ).
Type II errors : When we accept the null hypothesis, but it is false. Type II errors are denoted by beta( [Tex]\beta [/Tex] ).

	Null Hypothesis is True	Null Hypothesis is False
Null Hypothesis is True (Accept)	Correct Decision	Type II Error (False Negative)
Alternative Hypothesis is True (Reject)	Type I Error (False Positive)	Correct Decision

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ), suggesting an effect or difference.

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

If Test Statistic>Critical Value: Reject the null hypothesis.
If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

If the p-value is less than or equal to the significance level i.e. ( [Tex]p\leq\alpha [/Tex] ), you reject the null hypothesis. This indicates that the observed results are unlikely to have occurred by chance alone, providing evidence in favor of the alternative hypothesis.
If the p-value is greater than the significance level i.e. ( [Tex]p\geq \alpha[/Tex] ), you fail to reject the null hypothesis. This suggests that the observed results are consistent with what would be expected under the null hypothesis.

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

[Tex]\bar{x} [/Tex] is the sample mean,
μ represents the population mean,
σ is the standard deviation
and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]

t = t-score,
x̄ = sample mean
μ = population mean,
s = standard deviation of the sample,
n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]

[Tex]O_{ij}[/Tex] is the observed frequency in cell [Tex]{ij} [/Tex]
i,j are the rows and columns index respectively.
[Tex]E_{ij}[/Tex] is the expected frequency in cell [Tex]{ij}[/Tex] , calculated as : [Tex]\frac{{\text{{Row total}} \times \text{{Column total}}}}{{\text{{Total observations}}}}[/Tex]

Real life Examples of Hypothesis Testing

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

m = mean of the difference i.e X after, X before
s = standard deviation of the difference (d) i.e d i = X after, i − X before,
n = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Case A

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

import numpy as np from scipy import stats # Data before_treatment = np . array ([ 120 , 122 , 118 , 130 , 125 , 128 , 115 , 121 , 123 , 119 ]) after_treatment = np . array ([ 115 , 120 , 112 , 128 , 122 , 125 , 110 , 117 , 119 , 114 ]) # Step 1: Null and Alternate Hypotheses # Null Hypothesis: The new drug has no effect on blood pressure. # Alternate Hypothesis: The new drug has an effect on blood pressure. null_hypothesis = "The new drug has no effect on blood pressure." alternate_hypothesis = "The new drug has an effect on blood pressure." # Step 2: Significance Level alpha = 0.05 # Step 3: Paired T-test t_statistic , p_value = stats . ttest_rel ( after_treatment , before_treatment ) # Step 4: Calculate T-statistic manually m = np . mean ( after_treatment - before_treatment ) s = np . std ( after_treatment - before_treatment , ddof = 1 ) # using ddof=1 for sample standard deviation n = len ( before_treatment ) t_statistic_manual = m / ( s / np . sqrt ( n )) # Step 5: Decision if p_value <= alpha : decision = "Reject" else : decision = "Fail to reject" # Conclusion if decision == "Reject" : conclusion = "There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different." else : conclusion = "There is insufficient evidence to claim a significant difference in average blood pressure before and after treatment with the new drug." # Display results print ( "T-statistic (from scipy):" , t_statistic ) print ( "P-value (from scipy):" , p_value ) print ( "T-statistic (calculated manually):" , t_statistic_manual ) print ( f "Decision: { decision } the null hypothesis at alpha= { alpha } ." ) print ( "Conclusion:" , conclusion )

T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05.

The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] and we get accordingly , Z =2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Python Implementation of Case B

import scipy.stats as stats import math import numpy as np # Given data sample_data = np . array ( [ 205 , 198 , 210 , 190 , 215 , 205 , 200 , 192 , 198 , 205 , 198 , 202 , 208 , 200 , 205 , 198 , 205 , 210 , 192 , 205 , 198 , 205 , 210 , 192 , 205 ]) population_std_dev = 5 population_mean = 200 sample_size = len ( sample_data ) # Step 1: Define the Hypotheses # Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL. # Alternate Hypothesis (H1): The average cholesterol level in a population is different from 200 mg/dL. # Step 2: Define the Significance Level alpha = 0.05 # Two-tailed test # Critical values for a significance level of 0.05 (two-tailed) critical_value_left = stats . norm . ppf ( alpha / 2 ) critical_value_right = - critical_value_left # Step 3: Compute the test statistic sample_mean = sample_data . mean () z_score = ( sample_mean - population_mean ) / \ ( population_std_dev / math . sqrt ( sample_size )) # Step 4: Result # Check if the absolute value of the test statistic is greater than the critical values if abs ( z_score ) > max ( abs ( critical_value_left ), abs ( critical_value_right )): print ( "Reject the null hypothesis." ) print ( "There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL." ) else : print ( "Fail to reject the null hypothesis." ) print ( "There is not enough evidence to conclude that the average cholesterol level in the population is different from 200 mg/dL." )

Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.

Limitations of Hypothesis Testing

Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( [Tex]H_o [/Tex] ): No effect or difference exists. Alternative Hypothesis ( [Tex]H_1 [/Tex] ): An effect or difference exists. Significance Level ( [Tex]\alpha [/Tex] ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

IMAGES

PPT
Guide to Hypothesis Testing for Data Scientists
Hypothesis Testing Solved Examples(Questions and Solutions)
Statistical Hypothesis Testing: Step by Step
PPT
Hypothesis Testing Steps & Examples

VIDEO

Concept of Hypothesis
What Is A Hypothesis?
Lesson 21 Expoloring More Elements in Hypothesis Testing
EXPLORING MORE ELEMENTS OF HYPOTHESIS TESTING
4. Hypothesis (Prediction Remake)
Null Hypothesis

COMMENTS

8.1: The Elements of Hypothesis Testing
Hypothesis testing is a statistical procedure in which a choice is made between a null hypothesis and an alternative hypothesis based on information in a sample. The end result of a hypotheses testing procedure is a choice of one of the following two possible conclusions: Reject H0. H 0. (and therefore accept Ha.
9.1: Introduction to Hypothesis Testing
In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis.The null hypothesis is usually denoted $H_0$ while the alternative hypothesis is usually denoted $H_1$. An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor ...
Hypothesis Testing
There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1 ). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis. Present the findings in your results ...
The Elements of Hypothesis Testing
Definition. Hypothesis testing is a statistical procedure in which a choice is made between a null hypothesis and an alternative hypothesis based on information in a sample. The end result of a hypotheses testing procedure is a choice of one of the following two possible conclusions: Reject H0 (and therefore accept Ha ), or.
Statistical Hypothesis Testing Overview
Hypothesis testing is a crucial procedure to perform when you want to make inferences about a population using a random sample. These inferences include estimating population properties such as the mean, differences between means, proportions, and the relationships between variables. This post provides an overview of statistical hypothesis testing.
Hypothesis Testing
The Four Steps in Hypothesis Testing. STEP 1: State the appropriate null and alternative hypotheses, Ho and Ha. STEP 2: Obtain a random sample, collect relevant data, and check whether the data meet the conditions under which the test can be used. If the conditions are met, summarize the data using a test statistic.
Hypothesis Tests
The key element of a statistical hypothesis test is the test statistic, which (like any statistic) is a function of the data. A test statistic takes our entire dataset, and reduces it to one number. This one number ideally should contain all the information in the data that is relevant for assessing the two hypotheses of interest, and exclude ...
Introduction to Hypothesis Testing
A hypothesis test consists of five steps: 1. State the hypotheses. State the null and alternative hypotheses. These two hypotheses need to be mutually exclusive, so if one is true then the other must be false. 2. Determine a significance level to use for the hypothesis. Decide on a significance level.
5: Hypothesis Testing
Identify the similar elements between confidence intervals and hypothesis tests; 5.1 - Hypothesis Testing Overview 5.1 - Hypothesis Testing Overview ... In hypothesis testing, we refer to the presumption of innocence as the NULL HYPOTHESIS. So while the prosecutor has a research hypothesis, it must be shown that the presumption of innocence can ...
Chapter 11: Fundamentals of Hypothesis Testing
Hypothesis testing refers to the process of choosing between two hypothesis statements about a probability distribution based on observed data from the distribution. Hypothesis testing is a step-by-step methodology that allows you to make inferences about a population parameter by analyzing differences between the results observed (the sample statistic) and the results that can be expected if ...
Statistical hypothesis test
A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p ...
6a.2
Below these are summarized into six such steps to conducting a test of a hypothesis. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as H 0, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is ...
PDF Statistics: Hypothesis Testing
This handout will define the basic elements of hypothesis testing and provide the steps to perform hypothesis tests using the P-value method and the critical value method. Many statistics courses use statistical calculation tools; however, this handout is designed for manually computed formulas. Basics of Hypothesis Testing. All hypothesis ...
PDF Hypothesis Testing: Basic Concepts
A hypothesis test allows us to test the claim about the population and find out how likely it is to be true. The hypothesis test consists of several components; two statements, the null hypothesis and the alternative hypothesis, the test statistic and the critical value, which in turn give us the P-value and the rejection region ...
8: Testing Hypotheses
8.1: The Elements of Hypothesis Testing A hypothesis about the value of a population parameter is an assertion about its value. As in the introductory example we will be concerned with testing the truth of two competing hypotheses, only one of which can be true. 8.2: Large Sample Tests for a Population Mean
Hypothesis Testing
Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant. It involves the setting up of a null hypothesis and an alternate hypothesis. There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
Hypothesis Testing: 4 Steps and Example
Hypothesis testing is an act in statistics whereby an analyst tests an assumption regarding a population parameter. The methodology employed by the analyst depends on the nature of the data used ...
8-1. The Elements of Hypothesis Testing
A hypothesis about the value of a population parameter is an assertion about its value. As in the introductory example we will be concerned with testing the truth of two competing hypotheses, only one of which can be true. The null hypothesis, denoted H 0 H_0 H 0 , is the statement about the population parameter that is assumed to be true unless there is convincing evidence to the contrary.
What is Hypothesis Testing? Types and Methods
A number of elements involved in hypothesis testing are - significance level, p-level, test statistic, and method of hypothesis testing. (Also read: Introduction to probability distributions ) A significant way to determine whether a hypothesis stands true or not is to verify the data samples and identify the plausible hypothesis among the null ...
Ch. 10 BUS Statistics Flashcards
Study with Quizlet and memorize flashcards containing terms like Which of the following are essential elements of hypothesis testing? Select all that apply., Which of the following statements of a test hypothesis uses the correct protocol for stating an alternate hypothesis? Select all that apply., Choose the best definition of "hypothesis" in the context of statistical analysis. and more.
Lesson 85
The null hypothesis () is what is assumed to be true before any evidence from data. It is usually the null situation that has to be disproved otherwise. Null has the meaning of "no effect," or "of no consequence.". is identified with the hypothesis of no change from the current belief.
1.2: The 7-Step Process of Statistical Hypothesis Testing
Step 7: Based on steps 5 and 6, draw a conclusion about H0. If the F\calculated F \calculated from the data is larger than the Fα F α, then you are in the rejection region and you can reject the null hypothesis with (1 − α) ( 1 − α) level of confidence. Note that modern statistical software condenses steps 6 and 7 by providing a p p -value.
Understanding Hypothesis Testing
Defining Hypotheses. Null hypothesis (H 0): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example: A company's mean production is 50 units/per da H 0: [Tex]\mu [/Tex] = 50.

Have a language expert improve your writing

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Table of contents

Receive feedback on language, structure, and formatting

Prevent plagiarism. Run a free check.

Cite this Scribbr article

Is this article helpful?

Rebecca Bevans

8.1 The Elements of Hypothesis Testing

Types of Hypotheses

The Logic of Hypothesis Testing

The Rejection Region

Two Types of Errors

Standardizing the Test Statistic

Systematic Hypothesis Testing Procedure: Critical Value Approach

Key Takeaways

Statistical Hypothesis Testing Overview

Why You Should Perform Statistical Hypothesis Testing

Hypothesis Testing

Null Hypothesis

Alternative Hypothesis

Significance Level (Alpha)

Types of Errors in Hypothesis Testing

Which Type of Hypothesis Test is Right for You?

Share this:

Reader Interactions

Comments and Questions Cancel reply

Hypothesis tests #

Test statistics #

Testing the equality of two proportions #

The data and research question #

The population structure #

A test statistic #

Calibrating the evidence in the test statistic #

Summary of this example #

Comparison of means #

Assessment of a correlation #

Sampling properties of p-values #

Introduction to Hypothesis Testing

The Two Types of Statistical Hypotheses

Hypothesis Tests

The Two Types of Decision Errors

One-Tailed and Two-Tailed Tests

Types of Hypothesis Tests

Featured Posts

Leave a Reply Cancel reply

Join the Statology Community

Statistics for LIS with Open Source R

A Primer for Using Open Source R Software for Accessibility and Visualization

User Preferences

Keyboard Shortcuts

Six Steps for Hypothesis Tests Section

Hypothesis Testing

What is Hypothesis Testing in Statistics?

Hypothesis Testing Definition

Null Hypothesis

Alternative Hypothesis

Hypothesis Testing P Value

Hypothesis Testing Critical region

Hypothesis Testing Formula

Types of Hypothesis Testing

Hypothesis Testing Z Test

Hypothesis Testing t Test

Hypothesis Testing Chi Square

One Tailed Hypothesis Testing

Two Tailed Hypothesis Testing

Hypothesis Testing Steps

Hypothesis Testing Example

Hypothesis Testing and Confidence Intervals

Examples on Hypothesis Testing

FAQs on Hypothesis Testing

What is the z Test in Hypothesis Testing?

What is the t Test in Hypothesis Testing?

What is the formula for z test in Hypothesis Testing?

What is the p Value in Hypothesis Testing?

What is One Tail Hypothesis Testing?

What is the Alpha Level in Two Tail Hypothesis Testing?

What Is Hypothesis Testing?

4 Step Process

Hypothesis Testing: 4 Steps and Example