In order to survive in the business world, business units today are forced to innovate and launch the products immediately in the market. But this is easier said than done. Numerous factors come into the picture for this to materialise. Notably among them is the fact that too much cost factor which comes into the picture. For the product launching is well planned and thought off activity.
The activities include conducting market surveys which in plain sense means that the business units are required to conduct or determine the feasibility of the new product within a limited area and then based on the results they take further course of action i.e. go ahead with the launch of the product or to drop the project altogether.
In other words, business units conduct sample surveys i.e. obtaining the response on a small piece of the larger picture and then based on the results of the small piece, estimate the likely response on the larger piece of the picture. The small piece is known as the sample and the larger piece is known as the population.
Thus the concept of sample and population plays a vital role and assists the management in taking core decisions which may or not prove fruitful in the survival of the business. In order, to take decisions based on the sample and to estimate the population parameters business units are required to start with some of the assumptions or the hypothesis. And, based on assumptions or hypothesis about the population it is tested meaning that whatever the assumption that they started with, whether the assumption was correct or incorrect. Thus we have hypothesis testing.
Let us take an example to illustrate what has been said above. Suppose, the business units want to bring in a new product in the market which will increase the market share and hence the profitability of the business unit. In this case, the hypothesis would be introduction of new product will increase the profitability and based on this the survey would be conducted. The results of analysis of the data will reveal whether the hypothesis was correct or incorrect.
This unit will cover the basics of hypothesis and its testing; the steps required to test the hypothesis. This unit will also cover the types and characteristics of hypothesis and the like.
After studying this unit, the reader will be able to:
Understand the basic concepts of hypothesis
Understand the various types and the characteristics of hypothesis
Understand the steps involved in the testing of hypothesis
Understand the two tailed and the one tailed tests involved in the testing of hypothesis
Understand the criterion when to accept or when to reject the hypothesis
Understand the manner in which decisions are to be taken on the basis of the results arrived during the process of testing of hypothesis
6.2 Defining Hypothesis
In order to discuss the basics of hypothesis testing in detail let us now, define what is meant by hypothesis.
Simply speaking, hypothesis is a unit of the inferential statistics (i.e. the branch of statistics which is used to infer information on the collected data) which is used to test a claim about the larger portion (which is called population) based on the data collected from the smaller part known as sample. In other words hypothesis testing is the process of staking claim based on the values obtained from the sample.
Let us take an example in order to drive home the point illustrated above.
A manufacturer involved in the manufacturing of types claims that the average life of their tires will last at least 70,000 kms. We want to test the claim made by the manufacturer. The process we will adopt is to take a sample of tires, run them until they see how many kms. on average they have lasted. If the sample has lasted over 70,000 kms, then we do have the reason to believe that the claim is correct and that all the other tires they produce will also last 70,000 kms. miles.
In arriving at this conclusion, we may commit the following
We may incorrectly say “the tires do not last at least 70,000 kms” when in fact they do last
We may incorrectly say “the tires do last at least 70,000 kms” when in fact they do not
Thus, we may commit some errors during the process of staking the claim to the hypothesis we have formulated.
This aspect will be covered in next section
6.3 Characteristics of Hypothesis
Having understood the definition of hypothesis, let us now understand the characteristics of hypothesis. The following are the characteristics of hypothesis.
A hypothesis is based on reasoning which appears to be justified
This simply means that the hypothesis we have formulated should be based on the previous research and the hypothesis should follow the most likely outcome not the exceptional outcome. For example, we should form the hypothesis regarding the launching of new product on the basis of the previous data which was analysed and which prompted us to take further steps such as market research and the like
A hypothesis should provide a reasonable explanation for the outcome which is to be predicted
This means that the hypothesis formulated should not focus on the unrealistic outcome i.e. the hypothesis should be based on the realistic scenario. For example, an hypothesis such as our new software will surpass the sales of the software dealer who is leading the software market or that our software will sell very well on the surface of the moon. All these are unrealistic.
A hypothesis should clearly state the relationship between the variables that are defined
This simply means that the hypothesis should not be vague. It should be in plain simple terms and in a language which is simple to understand. For example, the hypothesis that the MIS report will be printed somewhat in 3 to 4 minutes is ambiguous and confusing.
A hypothesis defines the variables measurable terms
This means that the hypothesis focus on the aspects such as who all would be affected; who are the players in the process and the like. For example, hypothesis, that the product will work correctly for 2 months for small children.
A hypothesis is testable in a given or sufficient amount of time
This means that the hypothesis is tested within a finite amount of time. An hypothesis which cannot be tested within the finite amount of time will never be tested nor accepted
6.4 Types of Hypothesis
Having understood the basic terminology of hypothesis let us now discuss the types of hypothesis. Though we have just scratched the types of hypothesis, let us now go deeper into the detail of types of hypothesis.
Hypothesis are of various types. Some of them are discussed below
This hypothesis is formulated when the statistician believes that there is no relationship between two variables or when there is insufficient information to formulate a state a research hypothesis. It is denoted by H0
This hypothesis is the opposite of Null hypothesis. it is formulated then the researcher believes that there is sufficient information to believe that there is relationship between the variables. It is represented as H1 or Hµ
This hypothesis predicts the relationship between an independent variable and a dependent variable. Both the variables must be single variables
This hypothesis is used to predict the relationship between two or more independent variables and two or more dependent variables
Examples of different types of Hypothesis
Health related education programmes influence the number of people who smoke
Newspapers affects peoples living standard
Absenteeism in classes affects exam scores
Lower levels of exercise is responsible for increase in weight
6.5 Hypothesis Testing
Having understood the various types of hypothesis let us dwell on the important point of hypothesis testing. As stated above hypothesis means that we verify the claim on the larger unit based on the data and the results obtained by performing statistical tests on the data. let us now look at the steps involved in the testing of hypothesis. the following are the steps :
Describe in a statement about the population characteristic for which the hypotheses is to be tested
State the null hypothesis and depict as Ho
State the alternative hypothesis depict it as H1 or Ha
Identify and display the test statistic that will be used
Identify the region of rejection region
Is it on the upper, lower, or on the two-tailed test
Determine the critical value that will be associated as a, the level of significance at which the test is to be conducted
Compute the quantities in the test statistic
State the conclusion based on the computed statistics meaning that it is now to be decided as to whether reject the null hypothesis, Ho, or accept the alternate hypothesis. The conclusion is dependent on the level of significance of the test.
Figure 1 provides a graphical view of the steps involved in the testing of hypothesis
Figure 1 Steps involved in the testing of hypothesis
6.6 Difference between Null Hypothesis and Alternative Hypothesis
In the previous units we have understood the basics of null hypothesis and alternative hypothesis, let us now discuss the difference between these types of hypothesis. the following are the differences
Null hypothesis describes the prediction while alternative hypothesis describes other possible outcomes. For example, if we predict A is related to B which is null hypothesis while the alternative hypothesis will be A is not related to B meaning that A can be teacher of B, A can be mentor of B and so on
The alternative hypothesis can be negative but it is not necessarily a negation of null hypothesis but rather that it is a measure of finding out whether the null hypothesis is true or not meaning that whether it should be accepted or it should be rejected
Alternative hypothesis provides an opportunity to look at other things and other possibilities where as null hypothesis provides the presence or absence of the same meaning that when we deal with null hypothesis our focus becomes restricted while in the case of alternative hypothesis our focus needs to be wider
6.7 Decision Rule
Decision rules are the procedures that enable us to determine whether the findings of the observed samples are in sharp contradiction i.e. there is significant difference from the results that were expected and which will thus help us to decide whether to accept or reject hypotheses are called rules of decision or simply decision rules.
Let us take an example in order to illustrate what has been said with regard to decision rule. Suppose that we toss a coin 50 times and get head 42 times and if we had the null hypothesis that the coir is fair. Now in this scenario, there is sufficient reason to believe that the coin is biased based on the output obtained although we may be wrong in this manner. In the current scenario, the observations are saying something else in comparison to our hypothesis, hence, we are in a dilemma as to accept or reject the hypothesis. Procedures , which assist us in deciding whether to accept or reject the hypothesis when there is significant difference between the observed and the stated are know an Decision Rules.
Type I and Type II errors
It is in situations like the above, that we may commit errors or mistakes which are classified as
Type I or Type II errors.
Type I error is when we reject the hypothesis when it should have been accepted
Type II error is when we accept a hypothesis when it should have been rejected
From the above definitions, in both the cases a wrong decision has been made. Hence, it becomes imperative that we need to minimize the errors while making decisions.
Level of Significance
While testing the given hypothesis the maximum risk that we can take for Type I error is called the level of signi¬cance of the test. This is denoted by Greek letter Alpha Î±. It is decided before hand so that they do not influence the choice of our decisions.
6.8 Two tailed and one tailed tests
In order to understand the concept of two tailed and one tailed tests, consider the following scenario. Let us have a null hypothesis H0 and an alternative hypothesis H1. We want to conduct the test and determine whether we should reject the null hypothesis in favour of alternative hypothesis.
Thus, we have two different types of test which can be performed viz. One Tailed test and Two Tailed test
One-tailed test seeks to look for an increase or decrease in the parameter under consideration while two-tailed test seeks to look for any change in the parameter
We can carry out the test at any level 1%, 5% or 10% are the common levels. For example, when we perform the test at a 5% level it means that there is a 5% chance of wrongly rejecting H0 that is null hypothesis on the other hand If we perform the test at the 5% level and decide to reject the null hypothesis, we say that there is a significant evidence at 5% to suggest that the hypothesis is false”.
For the one tailed test we choose a critical region. In a one-tailed test, the critical region will have just one part. If the sample value lies in this region, we will reject the null hypothesis in favour of the alternative
On the other hand , suppose we want to look for a definite decrease. Then the critical region will be to the left. It is to be remembered that in the one-tailed test the value of the parameter can be as high as you like
Suppose we are given that we have a Poisson distribution and we want to carry out a hypothesis to the test on the mean, based upon a sample of observation 3.
Suppose the hypotheses are:
H0: l = 9
H1: l < 9
We want to test if it is “reasonable” for the value observed to be 3 to have come from a Poisson distribution with having a parameter value of 9. What is the probability that the value as low as 3 has come from a Poisson distribution have the value 9?
P(X â‰¤ 3) = 0.0212 (this has been obtained from Poisson table)
The probability is less than 0.05, which means that there is less than a 5% chance that the value has come from a Poisson(3) distribution. The null hypothesis should be rejected in favour of the alternative at the 5% level.
In a two-tailed test, we look for either an increase or a decrease. Hence, for example, H0 might be that the mean is equal to 9 (as before). This time, however, H1 would be that the mean is not equal to 9. So, In this case, therefore, the critical region has two parts:
Lets test the parameter p of a Binomial distribution at the 10% level.
Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the coin is fair. If the coin is fair, p = 0.5 . Put this as the null hypothesis:
H0: p = 0.5
H1: p â‰ 0.5
Because this is a 2-tailed test, the critical region also has two parts. Half of the critical region is to in the right and other half is in the left. So the critical region contains both the top 5% of the distribution and the bottom 5% of the distribution (as we are testing at the 10% level).
If H0 is true, X ~ Bin(10, 0.5).
If the null hypothesis is true, what is the probability that X is 7 or above?
P(X â‰¥ 7) = 1 – P(X < 7) = 1 – P(X â‰¤ 6) = 1 – 0.8281 = 0.1719
Is this in the critical region? No- because the probability that X is at least 7 is not less than 0.05 (5%), which is what we need it to be.
So there is no significant evidence to reject the null hypothesis at 10% level o signiicance
6.9 Procedure of Hypothesis testing
6.11 Terminal Questions
What is meant by mutually exclusive events?
The probability that Mr. Puneet will solve the problem is .75. The probability that Mr. Aneesh will solve the problem is 0.25. What is the probability that a given problem will be solved.
A box contains 4 green and 5 white balls. What is the probability of selecting at random two balls having
Having same color
Having different colors
The probability that a contractor will get a plumbing contract is 2/3 and the probability that he will not get an electrical contract is 5 / 9. If the probability of getting at least one of these contract is 4/5, what is the probability that he will get both?
A can solve 90 percent of the problems given in a book and B can solve 70 percents. What is the probability that at least one f them will solve a problem selected at random
5.14 Answers Self Assessment Questions
2. 1/6; ½; 1/3
3. 1/ 3; 2/3; 5/9
4. ¼; 1/13; ½; 2/13; 4/13
5.15 Answers Terminal Assessment Questions
1. Refer to glossary
3. (i)4/9, (ii)5/9
5.16 Suggested Reading
Testing statistical hypothesis, Lehmann, Joseph
Hypothesis testing with SPSS, Jim Mirabella
Fundamentals of Statistics, Michael Sullivan
Fundamentals of Statistics, S.C. Gupta
Fundamentals of Statistics, Trueman Lee Kelly
Introductory Probability And Statistical Applications, Meyer
Fundamental of Statistics, Vol II, Goon, Gupta and Dagupta
An Outline of Statistical Theory, Vol I, Goon, Gupta and Dagupta
A Basic Course in Statistics, Clarke, Geoffrey and Cooke, John Wiley & Sons
Basic Statistics, Nagar & Das
Quantitative Techniques for Decision Making, Anand Sharma
Statistics for economists: A beginning, John E. Floyd
The Elements of Statistical Learning, Trevor Hastie, Jerome Friedman.
Introduction to Statistical Thought, Michael Lavine
Aggregate It is the collection of small units which results in one complete entity. For example the aggregation of the total inhabitants of towns and villages and mega cities results in the population of the country
Alpha Level The probability that the statistical test will find difference between the groups which is significant when in fact there are none. This is also referred to as the probability of making a Type I error or as the significance level of a statistical test. A lower alpha level is better than a higher alpha level, with all else equal.
Alternative Hypothesis The experimental hypothesis stating that there is some real difference between two or more groups. It is the alternative to the null hypothesis, which states that there is no difference between groups.
Analysis of Variance (ANOVA) A statistical test that determines whether the means of two or more groups are significantly different.
Association A relationship between objects or variables.
Average A single value (mean, median, mode) representing the typical, normal, or middle value of a set of data.
Axiom A statement widely accepted as truth.
Bell-Shaped Curve A curve characteristic of a normal distribution, which is symmetrical about the mean and extends infinitely in both directions. The area under curve=1.0.
Beta Level The probability of making an error when comparing groups and stating that differences between the groups are the result of the chance variations when in reality the differences are the result of the experimental manipulation or intervention. Also referred to as the probability of making a Type II error.
Between-Group Variance A measure of the difference between the means of various groups.
Between-Subject Design Experimental design in which a different group of subjects are used for each level of the variable under study.
Bias Influences that distort the results of a research study.
Categorical Data Variables with discrete, non-numeric or qualitative categories (e.g. gender or marital status). The categories can be given numerical codes, but they cannot be ranked, added, multiplied or measured against each other. Also referred to as nominal data.
Causal Analysis An analysis that seeks to establish the cause and effect relationships between variables.
Central Tendency A measure that describes the ¿typical¿ or average characteristic; the three main measures of central tendency are mean, median and mode.
Coefficient of Determination A coefficient, ranging between 0 and 1, that indicates the goodness of fit of a regression model.
Comparability The quality of two or more objects that can be evaluated for their similarity and differences.
Confidence Interval A range of estimated values that is the best guess as to the true population’s value. Confidence intervals are usually calculated for the sample mean. In behavioral research, the acceptable level of confidence is usually 95%. Statistically, this means that if 100 random samples were drawn from a population and confidence intervals were calculated for the mean of each of the samples, 95 of the confidence intervals would contain the population’s mean. For example, a 95% confidence interval for IQ of 95 to 105, indicates with 95% certainty that the actual average IQ in the population lies between 95 and 105.
Confidence Level The percentage of times that a confidence interval will include the true population value. If the confidence level is .95 this means that if a researcher were to randomly sample a population 100 times, 95% of the time the estimated confidence interval for a value will contain the population’s true value. In other words, the researcher can be 95% confident that the confidence interval contains the true population value.
Consistency The process in surveys whereby a question should be answered similarly to previous questions.
Constant A value that stays the same for all the units of an analysis. For instance, in a research study that explores fathers¿ involvement in their children¿s lives, gender would be constant, as all subjects (units of analysis) are male.
A concept. A theoretical creation that cannot be directly observed.
The degree to which a variable, test, questionnaire or instrument measures the theoretical concept that the researcher hopes to measure. For example, if a researcher is interested in the theoretical concept of “marital satisfaction,” and the researcher uses a questionnaire to measure marital satisfaction, if the questionnaire has construct validity it is considered to be a good measure of marital satisfaction.
Continuous Variable A variable that, in theory, can take on any value within a range. The opposite of continuous is discrete. For example, a person’s height could be 5 feet 1 inch, 5 feet 1.1 inches, 5 feet 1.11 inches, and so one, thus it is continuous. One’s gender is either “male” or “female”, thus it is discrete.
Correlation The degree to which two variables are associated. Variables are positively correlated if they both tend to increase at the same time. For example, height and weight are positively correlated because as height increases weight also tends to increases. Variables are negatively correlated if as one increases the other decreases. For example, number of police officers in a community and crime rates are negatively correlated because as the number of police officers increases the crime rate tends to decrease.
Correlation Coefficient A measure of the degree to which two variables are related. A correlation coefficient in always between -1 and +1. If the correlation coefficient is between 0 and +1 then the variables are positively correlated. If the correlation coefficient is between 0 and -1 then the variables are negatively correlated.
Cross-Sectional Data Data collected about individuals at only one point in time. This is contrasted with longitudinal data, which is collected from the same individuals at more than one point in time.
Cross-Tabulation A method to display the relationship between two categorical variables. A table is created with the values of one variable across the top and the values of the second variable down the side. The number of observations that correspond to each cell of the table are indicated in each of the table cells.
Data Information collected through surveys, interviews, or observations. Statistics are produced from data, and data must be processed to be of practical use.
Data Analysis The process by which data are organized to better understand patterns of behavior within the target population. Data analysis is an umbrella term that refers to many particular forms of analysis such as content analysis, cost-benefit analysis, network analysis, path analysis, regression analysis, etc.
Data Collection The observation, measurement, and recording of information in a research study.
Deductive Method A method of study that begins with a theory and the generation of a hypothesis that can be tested through the collection of data, and ultimately lead to the confirmation (or lack thereof) of the original theory.
Degrees of Freedom The number of independent units of information in a sample used in the estimation of a parameter or calculation of a statistic. The degrees of freedom limits the number variables that can be included in a statistical model. Models with similar explanatory power, but more degrees of freedom are generally prefered because they offer a simpler explanation.
Dependent Variable The outcome variable. In experimental research, this variable is expected to depend on a predictor (or independent) variable.
Descriptive Statistics Basic statistics used to describe and summarize data. Descriptive statistics generally include measures of the average values of variables (mean, median, and mode) and measures of the dispersion of variables (variance, standard deviation, or range).
Dichotomous Variables Variables that have only two categories, such as gender (male and female).
Direct Effect The effect of one variable on another variable, without any intervening variables.
Direct Observation A method of gathering data primarily through close visual inspection of a natural setting. Direct observation does not involve actively engaging members of a setting in conversations or interviews. Rather, the direct observer strives to be unobtrusive and detached from the setting.
Discrete Variables A variable that can assume only a finite number of values; it consists of separate, indivisible categories. The opposite of discrete is continuous. For example, one’s gender is either “male” or “female”, thus gender is discrete. A person’s height could be 5 feet 1 inch, 5 feet 1.1 inches, 5 feet 1.11 inches, and so on, thus it is continuous.
Dispersion The spread of a variable’s values. Techniques that describe dispersion include range, variance, standard deviation, and skew.
Distribution The frequency with which values of a variable occur in a sample or a population. To graph a distribution, first the values of the variables are listed across the bottom of the graph. The number of times the value occurs are listed up the side of the graph. A bar is drawn that corresponds to how many times each value occurred in the data. For example, a graph of the distribution of women’s heights from a random sample of the population would be shaped like a bell. Most women’s height are around 5’4″ This value would occur most frequently, so it would have the highest bar. Heights that are close to 5’4″, such as 5’3″ and 5’5″ would have slightly shorter bars. More extreme heights, such as 4’7″ and 6’1″ would have very short bars.
Error The difference between the actual observed data value and the predicted or estimated data value. Predicted or estimated data values are calculated in statistical analyses, such as regression analysis.
Estimation The process by which data from a sample are used to indicate the value of an unknown quantity in a population.
Evaluation Research The use of scientific research methods to plan intervention programs, to monitor the implementation of new programs and the operation of existing programs, and to determine how effectively programs or clinical practices achieve their goals.
Exploratory Study A study that aims to identify relationships between variables when there are no predetermined expectations as to the nature of those relations. Many variables are often taken into account and compared, using a variety of techniques in the search for patterns.
Extrapolation Predicting the value of unknown data points by projecting beyond the range of known data points.
Factor Analysis An exploratory form of multivariate analysis that takes a large number of variables or objects and aims to identify a small number of factors that explain the interrelations among the variables or objects.
Focus Group An interview conducted with a small group of people, all at one time, to explore ideas on a particular topic. The goal of a focus group is to uncover additional information through participants’ exchange of ideas.
Forecasting The prediction of the size of a future quantity (e.g., unemployment rate next year).
Histogram A visual presentation of data that shows the frequencies with which each value of a variable occurs. Each value of a variable typically is displayed along the bottom of a histogram, and a bar is drawn for each value. The height of the bar corresponds to the frequency with which that value occurs.
Hypothesis A statement that predicts the relationship between the independent (causal) and dependent (outcome) variables.
Hypothesis Testing Statistical tests to determine whether a hypothesis is accepted or rejected. In hypothesis testing, two hypotheses are used: the null hypothesis and the alternative hypothesis. The alternative hypothesis is the hypothesis of interest; it generally states that there is a relationship between two variables. The null hypothesis states the opposite, that there is no relationship between two variables.
In-depth Interviewing A research method in which face-to-face interviews with respondents are conducted using