There were different stressors that lead to a higher perceived stress level among undergraduate students. As stated by Amponsah and Owolabi (2011), environmental changes, academic problems and social challenges such as living apart from home for the first time were the common stressors for college students. However, academic stress was the top source of stress among undergraduate students (Elias, Ping, & Abdullah, 2011).
In turn, these stressors could lead to both physiological and behavioral problems such as immune system disorders and psychological disorders (Zainiyah, Afiq, Chow, & Sara, 2011). Research has shown that students who experienced stress may also lead to physical problems, psychological distress, behavioral problems and even poor academic performance (Sharma & Kaur, 2011).
Context of Study
There were several researches that have examined the relationship between optimism and perceived stress level. As highlighted by Rabiega and Cannon (n.d.), optimism is the belief of positive outcomes in future. Optimism is also known as a buffer against hopelessness (Jit-Ho, n.d). Both physical and psychological well-being could be influenced by optimistic thinking.
As proposed by Diener and Chan (2011), optimism could improve health and even lead to longevity. Rasmussen, Wrosch, Scheier, and Carver (2006) also found that optimists have better outcomes than pessimists when experienced threat to health as they reported less distress. Overall, optimism could bring benefits to different life situations especially in minimizing the stress levels (Shearman et al., 2011).
In a sample of undergraduate students in southeastern United States, Cann, Stilwell, and Taku (n.d.) revealed that individuals who possessed optimism, happiness and hope, supported by a good sense of humor would have a lower level of perceived stress. Furthermore, optimism was associated with positive mood and attitude that would lead to a stress free lifestyle (Daraei & Ghaderi, 2012). Saklofske, Austin, Mastoras, Beaton, and Osborne (2012) have supported that positive affect could serve as a buffer against stress while negative affect caused people more vulnerable to stress.
According to Shearman et al. (2011), high levels of optimism have a positive influence in coping with stressful situations. University students who have psychological strength such as optimism tend to adopt problem-focused coping strategies (Khan et al., 2011). Wang and Yeh (2005) explained that problem-focused coping involved optimistic action and social support that was helpful in alleviated psychological distress and minimized the negative impacts of stress. When the perceived stress level was moderate or lower, problem-focused coping became more adaptive. However, emotion-focused coping strategies such as avoidance and emotional were used at a higher perceived stress level.
Moreover, optimistic students reported better academic performance (Medlin & Faulk, n.d.). According to Gurol and Kerimgil (2010), there was a relationship between academic optimism and school success. There were three properties which known as academic emphasis, collective efficacy, and trust of students and parents that have shaped the culture of academic optimism. Among these three properties, academic emphasis could facilitate students’ learning which in turn encourages achievement. Generally, academic optimism was positively associated with academic performance (Bressler, Bressler, & Bressler, 2010). Academic optimism was also positively influenced students’ academic achievement despite socioeconomic status (Gurol & Kerimgil, 2010).
Furthermore, Lounsbury, Levy, Park, Gibson, and Smith (2009) have demonstrated the relationship between optimism, self-directed learning and academic performance among students in their study. The result showed that self-directed learning was linked to optimism. In fact, these students tend to perform better in their academic. Mustafa, Elias, Noah, and Roslan (2010) further proposed that the positive relationship between students’ engagement in learning and academic success could be explained by integrating flow theory into the model of motivation. It was possible that academic success was caused by the subjective experience of having high concentration and capability to master the task as well as enjoyment of learning.
Generally, optimism was associated with better coping during stressful situation (Fernandez-Castro, Rovira, Doval, & Edo, 2009). Finding has shown that university students with optimism tend to adopt problem-focused coping strategies (Khan et al., 2011). In turn, these strategies could help them to manage stressful events during university life and even lead to better adjustment and academic success (Abdullah, Elias, Uli, & Mahyuddin, 2010). Apparently, optimism was the best strategy in any situation as individuals could gain resources for achieving goals and more open to new experiences (Forgeard & Seligman, 2012).
Many students experienced stress during their university life (Khan et al., 2011). According to The Stars (Feb, 2011), National Suicide Registry Malaysia (NSRM) has revealed that the inability of teenagers and young adults to cope with stress from school, work, family issues and relationship problem could lead to suicide. In fact, long-term stress was associated with anxiety and depression (Smith, Segal, & Segal, 2012).
Asiaone (March, 2012) also reported that a Universiti Sains Malaysia (USM) final year student was suspected to commit suicide because he was disappointed with his result. Apparently, there is an increase of suicide cases among undergraduate students in Malaysia because of stress. Therefore, it is crucial to examine the stress issue in order to prevent suicide behaviors among students. The rise of stress among undergraduate students is an issue that must not be overlooked as it could negatively affect students in different ways. By this identification of research problem, more information could be obtained on how optimism influenced both the stress level and academic achievement of undergraduate students.
Significance of Study
Different kinds of stress could be experienced by undergraduate students. They have to deal with challenging academic work, explore about their career options and build a good social relationship with others. Since stress has become a major concern among undergraduate students, it is significant to conduct research to explore more about stress and aspects that are linked to stress.
This study aims to explore the relationship between optimism and stress among undergraduate students in Universiti Tunku Abdul Rahman (UTAR). From present study, it can find out whether personality of optimism acts as a buffer against stress. It also gives a clearer picture on how optimism plays a vital role in order to reduce the stress among undergraduate students. Apart from that, people can have a more comprehensive understanding about the benefits of optimism and develop some helpful intervention programmes so that students can handle their stress in effective manner.
In addition, this study aims to investigate the relationship between optimism and academic achievement among undergraduate students in UTAR. Since students are the valuable assets of country, it is crucial to understand some factors that contribute to academic success. From present study, it can also offer insight into the relationship between optimism and academic achievement.
Purpose of Study
This study aims to explore the relationship between optimism and stress among undergraduate students in UTAR. Besides, it aims to investigate the relationship between optimism and academic achievement. Importantly, present study can demonstrate the importance of optimism in academic setting. This study is essential for future research to develop intervention programmes that can help undergraduate students to manage their stress and improve their academic performance.
The research questions of this study are shown as follow:
Is there any significant relationship between optimism and stress among undergraduate students in Universiti Tunku Abdul Rahman (UTAR)?
Is there any significant relationship between optimism and academic achievement among undergraduate students in Universiti Tunku Abdul Rahman (UTAR)?
Optimism. The definition offered by Conversano et al. (2010) proposed that the term ‘optimism’ consists of two interrelated concepts which are the tendency to hope and to believe that we live in “the best of all possible worlds”. According to Ashraf, Jaffri, Sharif, and Khan (2012), optimism can be expressed as behavior and attitude of human in which individual holds a belief of positive outcomes in every situation.
Stress. There were different views of stress. Smith et al. (2012) defined stress as a normal physical reaction which activates “fight-or-flight” reaction during threatening event. It is also the perceived stress that present or experienced by people (Jit-Ho, n.d.). In present study, the definition of stress was based on Cohen and Williamson (1988) which stated that perceived stress depends on how unpredictable, uncontrollable and overloaded is the life of individual. Stress was resulted from threatening and demanding events as well as inadequate coping resources to cope with threat or demand.
Undergraduate. A university or college student who has not obtained a first, particularly a bachelor’s degree (Dictionary.com, 2012). The undergraduate students in this study were selected from UTAR.
Resiliency. According to Garg and Rastogi (2009), the capacity to bounce back after setbacks, to be adjustable and to renovate the sense of vitality is known as resiliency. It involves the successful adaptation to challenging life experiences, particularly under stressful condition or traumatic events and manages to recover from these experiences in positive way.
Optimism and Psychological Well-being
As proposed by Cann et al. (n.d.), the way to interpret an event could affect psychological adjustment of individual. Particularly, individual with optimistic interpretive style would be more resilient to stress (Garg & Rastogi, 2009). Research has also shown that optimism was a negative predictor of psychological ill health (Rothmann & Essenko, 2007).
In a sample of undergraduate students in Poland, Posadzki, Musonda, Debska, and Polczyk (2009) found that depressive symptoms could be alleviated by optimism. Moreover, there was a positive relationship between optimism and students’ quality of life. This finding was supported by Karademas (2006) which revealed that optimism was negatively associated with depressive symptomatology while positively associated with satisfaction with life. In addition, optimism could strengthen the belief in own value and foster active coping which resulted in less burnout during early career (Salmela-Aro, Tolvanen, & Nurmi, 2009). Overall, previous findings have indicated that optimism was associated with greater psychological well-being.
Optimism, Belief and Stress
As pointed out by Urbig and Monsen (2012), dispositional optimism was associated with a specific belief structure. If the optimism of an individual was based on control and self-efficacy, he or she would be more likely to use active coping. Consequently, there would be an increased perceived chance for success. In contrast, passive coping would be used if the optimism of individual was based on belief in good luck. Generally, optimists believed that their problem was temporary and they could conquer the obstacles (Kumcagiz, Celik, Yilmaz, & Eren, 2011).
A research conducted by Lee and Bradley (n.d.) has shown that international students with greater optimistic beliefs about own capability could lead to a lower acculturative stress level. Besides, they showed more assertiveness compared to less optimistic students. They were more likely to engage in challenging tasks and exerted greater effort to conquer hardships. By producing positive experiences and constructive self-belief in students, they could also have a better adjustment to college life (DeAndrea, Ellison, LaRose, Steinfield, & Fiore, 2011). From these findings, it could be concluded that optimism was associated with positive belief that would alleviate the stress level.
Optimism, Meaning of education, Coping and Stress
Using a sample of undergraduate students, Krypel and Henderson-King (2010) have examined the meaning of education and the relationship with optimism, coping and students’ perceived stress. Optimism could keep students away from self-defeating view of education such as perceive education as stressor. Besides, optimism motivated them to view challenges in positive way and retain their responsibilities as students. As a result, optimistic students were more actively involved in education and even showed more persistence in pursuing education. Likewise, Bressler, Bressler, and Bressler (n.d.) stated that students with higher optimism may show more persistence to complete their degree program.
Apart from that, Krypel and Henderson-King (2010) pointed out that optimistic students were able to develop a more productive approach to deal with stress. There was also a positive correlation between positive meaning of education with problem-focused coping which involved positive reinterpretation of stressful events and active coping. Consequently, the perception of stress was lower. In short, optimism was linked to positive meaning of education and adaptive coping skills that could reduce the stress level.
Optimism, Coping and Stress
Conversano et al. (2010) revealed that optimism was positively correlated with coping strategies. As proposed by Carver, Scheier, and Segerstrom (2010), optimists adopted problem-focused coping during threatening events. They did not merely ignore threats but attempt to minimize the risks by attending to risks in a more selective manner. Therefore, optimists have less distress during adversity. Similarly, Rasmussen et al. (2006) found that optimists were more likely to adopt problem-focused coping strategies during adversity. However, when problem-focused coping was not feasible, they would apply emotion-focused strategies such as acceptance, humor and positive reframing.
As highlighted by Greenglass and Fiksenbaum (2009), positive affect could promote proactive coping which included goal setting and self-efficacy. Such proactive coping could provide resources for self-improvement and would be activated during stressful condition. In a sample of university students, positive affect could regulate the effect of proactive coping on depression. Study from DeAndrea et al. (2011) also stated that students who have optimistic outlook adopted better coping strategies which facilitated their adjustment to new environment. Taken together, these findings suggested that optimistic individual tend to apply effective coping skills such as problem-focused coping and proactive coping to resist against stress.
Optimism, Resilience and Stress
Jit-Ho (n.d.) has investigated the impacts of daily hassles and resilience in both physical and psychological well-being among undergraduate students in Hong Kong. Result showed that there was a positive association between optimism and resilience which in turn led to better health and lower stress level. During stressful period, there was a better adaptation of resilient people.
Besides, Steinhardt and Dolbier (2008) have found the positive relationship between resilience and effective coping strategies. People with high resiliency adopted more problem solving strategies. Ebrahimi, Keykhosrovani, Dehghani, and Javdan (2012) even revealed that resiliency was positively correlated with mental health among university students. Conversely, it was negatively correlated with stress and depression. Therefore, optimism was correlated with resiliency which in turn leads to lower stress levels among university students.
Optimism, Self-efficacy and Stress
There was an association formed between optimism and self-efficacy (Ashraf et al., 2012). According to Ahmed, Qazi, and Jabeen (2011), optimistic self-reliance of an individual was known as self-efficacy. Self-efficacy could motivate people to set goal and achieve it. Luszczynska, Gutierrez-Dona, and Schwarzer (2005) also supported that high self-efficacy individuals set higher goal and more persistent to attain it. They preferred challenging tasks and being highly committed to own goals even when there was setback. Importantly, self-efficacy could help people to bounce back after adversity (Ahmed et al., 2011).
In a study among undergraduate students, Posadzki et al. (2009) revealed that self-efficacy could affect the coping strategies among undergraduate students. Luszczynska et al. (2005) also highlighted that self-efficacy allowed individuals to cope with stressful events in a more effective manner. In turn, students with high self-efficacy showed less stress and adjustment problems (Lee & Bradley, n.d.). Generally, optimistic individual have higher self-efficacy and thus able to cope with stress effectively.
Optimism, Social support and Stress
Mosher, Prelow, Chen, and Yackel (2006) have conducted a study to investigate the potential mediators of the relation of optimism to depressive symptoms among college students. Findings indicated that optimistic students perceived greater social support. In fact, students with higher optimism and greater social support were associated with less depressive symptomatology.
As explained by Vollmann, Renner, and Weber (2007), optimists have higher tendency to be provided with instrumental support for striving toward their goal. Besides, they possessed positive personality and interpersonal attraction. Thus, they elicited positive social responses from others and inclined to generalize these responses. As a result, they received more social support from others. Based on these findings, optimists tend to get more social support that could help them to buffer against stress.
Optimism and Academic Achievement
As pointed out by Luszczynska et al. (2005), individual with positive expectancy and optimistic belief could be more motivated. Such motivational belief could lead to a successful academic adjustment (Cazan, 2012). In fact, motivation was a significant factor that leads to academic success beyond prior performance and intelligence (Steinmayr & Spinath, 2009). According to Mustafa et al. (2010), both intrinsic motivation and extrinsic motivation would motivate students to learn. However, students with intrinsic motivation performed better in the face of challenges.
Another study from Kumcagiz et al. (2011) has found a positive correlation between optimism of students and their levels of emotional intelligence. As highlighted by MacCann, Fogarty, Zeidner, and Roberts (2011), there was a positive relationship between emotional intelligence and academic achievement. Specifically, more problem-focused coping strategies were used by individuals with higher emotional management which resulted in better academic achievement.
Ogundokun and Adeyemo (2010) further explained that students with emotional intelligence were able to regulate their feeling, solve problem and have excellent interpersonal skills which associated with academic achievement. Importantly, fewer negative emotions were found in them and thus could avoid distraction in learning (MacCann et al., 2011). With a good emotional management, students could even cope well with stress and anxiety due to examination (Ogundokun & Adeyemo, 2010).
Furthermore, association was formed between optimism and self-efficacy (Ashraf et al., 2012). As discussed by Ozan, Gundogdu, Bay, and Celkan (2012), students with higher sense of efficacy have higher aspirations and greater flexibility in finding new solution. Besides, they showed better intellectual performance compared to students of equal cognitive ability who lack of efficacy. Such efficacy beliefs could increase one’s motivation and promote strategic thinking which in turn lead to achievement. Indeed, self-efficacy was one of the components of motivation that contributed most to performance (Katz & Shoshani, n.d.).
In addition, Robinson and Snipes (2009) proposed that hope, optimism and self-efficacy were expectancy beliefs that developed a system of competence and control which in turn lead to academic success. Specifically, students who have high self-efficacy possessed both the hope agency and hope pathways in which they were motivated to attain goals and identifying alternatives during adversity. Besides, they were optimistic about their plan. Hence, such interactive system of beliefs would result in better academic performance and coping skills.
Theoretical Framework of Optimism
Optimism was under the branches of positive psychology that was more emphasized on the abilities to conquer hardship rather than the person’s pathology (Rabiega & Cannon, n.d.). According to Sumer, Giannotta, Settanni, and Ciairano (2009), optimism could be defined as the expectation of the best possible outcomes.
Seligman (1998) has proposed an explanatory style model to explain optimism. In this model, optimists were those who attributed the causes of negative events to external, unstable, and specific causes while attributed the causes of positive events to personal and pervasive causes. Hence, optimistic individuals were more likely to focus on positive events and have more capabilities to handle future situations. In addition, individuals with positive explanatory style put more effort to achieve goals as they believe in their capabilities to reduce the discrepancy between goals and current situation (as cited in Kluemper, Little, & DeGroot, 2009).
On the other hand, Carver and Scheier (1981) have proposed a self-regulatory model which explained on how the self-regulatory nature of optimism could affect outcomes. In this theory, goal could direct human behavior. Assessment would be initiated when individual aware of the discrepancy between goal and current situation. More efforts would be exerted in order to achieve goals if individual perceived that the discrepancy between goal and current situation could be minimized (as cited in Kluemper et al., 2009).
Apart from that, Scheier and Carver (1985) have highlighted “dispositional optimism” in their study. This theory proposed that optimists have more positive attitudes about life. Besides, they showed more protective attitudes and higher resiliency during stressful events. They also implemented more useful coping skills. Therefore, there was a positive correlation between optimism with both physical and mental well-being (as cited in Conversano et al., 2010).
This chapter focused on several aspects which included research design, participants and location, instrument, research procedures and data analysis. Survey method with convenience sampling was being implemented in current study. The participants were Year 3 Psychology students from Universiti Tunku Abdul Rahman (UTAR) which located at Kampar, Perak. The instruments used were Life Orientation Test-Revised (LOT-R) to assess the level of optimism and Perceived Stress Scale, 10 item version (PSS-10) to measure the stress level. Besides, the Cumulative Grade Point Average (CGPA) of students was obtained to assess their academic achievement.
Survey. Research design could be defined as the overall plan of researchers towards the practical implementation of project (Draper, 2004). In current study, quantitative research design was used. Tewksbury (2009) has defined quantitative research as a scientific investigation in social science based on specific definition and careful operationalization of concepts and variables. It involved hypothesis testing by using statistical methods (Wood & Welch, 2010). Survey was the quantitative research method in present study and questionnaires were distributed for data collection. Self-report measure of survey method was implemented because of time-saving and it could assess the thoughts and feelings of participants. Besides, convenience sampling was used as sampling method. Convenience sampling is a nonprobability sampling in which participants were selected based on their availability and willingness to take part in the study (Shaughnessy, Zechmeister, & Zechmeister, 2009).
Participants and Location
In present study, the participants were Year 3 Psychology students from the Faculty of Art and Social Science (FAS) in UTAR. There were 100 participants selected by using convenience sampling. Year 3 students were selected as sample because they were in the final year of study that have to deal with different stressors such as academic stress, stress from doing FYP and also stress from planning for their future after they graduate. In fact, the most stressful group among undergraduate students was the final year students (Elias et al., 2011).
Particularly, Psychology students were selected as sample because they were vulnerable to stress. A study done by Collins (2010) stated that students from the Medicine, Law, Mechanical Engineering and Psychology faculties have higher tendency to develop anxiety and depression. Surprisingly, there was a higher tendency of suicidal thought and previous suicide attempts among psychology students compared to medical students (Kavalidou, 2013).
This study was conducted at UTAR which located in Kampar, Perak. Specifically, the survey was carried out in UTAR lecture hall, IDK 6.
The instrument used in current study was Life Orientation Test-Revised (LOT-R) in order to assess the level of optimism among undergraduate students. According to Scheier, Carver, and Bridges (1994), LOT-R was used to measure the dispositional optimism and pessimism.
There were three types of items in LOT-R which included four filler items that were not scored, three positive items and three negative items. The items 2, 5, 6 and 8 were filler items while items 3, 7 and 9 were the reversed items. One of the sample items in LOT-R was “In uncertain times, I usually expect the best”. The participants were required to choose the appropriate response by using a Likert scale with five possible choices. The scale ranges from 0 (strongly disagree) to 4 (strongly agree). In order to obtain the total score, sum the items 1, 3, 4, 7, 9 and 10. The scores for items 3, 7 and 9 have to be reversed. The possible range of total score is 0-24. There were three levels of optimism, namely high optimism (19-24), moderate optimism (14-18) and low optimism (0-13).
As pointed out by Scheier et al. (1994), the test-retest reliability for LOT-R were displayed as r = .68 for 4 month, r = .60 for 12 month, r = .56 for 24 month and r = .79 for 28 month. The Cronbach’s alpha was .78. In short, LOT-R was fairly constant across time with high internal consistency and test-retest reliability.
Apart from that, Perceived Stress Scale, 10 item version (PSS-10) was used in current study to measure the level of stress among undergraduate students. According to Cohen and Williamson (1988), original PSS was developed by Cohen in 1983 and used to measure the extent to which respondents reported their lives as unpredictable, uncontrollable and overloading. Specifically, PSS-10 was used in this study because it was easier to score. Besides, the factor structure and internal reliability of PSS-10 were slightly better than original PSS in which the Cronbach’s alpha coefficient of internal reliability was 0.78 (Cohen & Williamson, 1988).
In PSS-10, the questions were about the thoughts and feelings of participants during the last month. One of the sample items was “In the past month, how often have you been upset because of something that happened unexpectedly?” The participants were required to choose the appropriate response by using a Likert scale with five possible choices. The scale ranges from 0 (never) to 4 (very often). In order to obtain the total score, the scores on four positively stated items included items 4, 5, 7 and 8 were reversed and the 10 items were summed. The possible range of total score is 0-40. Individuals with higher scores indicated higher perceived stress. There were three perceived stress levels, namely low stress (0-13), moderate stress (14-26) and high stress (27-40).
Furthermore, academic achievement of participants were measured based on CGPA. The CGPA of participant was classified based on grading system of undergraduate degree examination in UTAR. There were four classification of honour including first class with CGPA 3.5-4.0, second class (upper division) with CGPA 3.0-3.49, second class (lower division) with CGPA 2.2-2.99 and third class with CGPA 2.0-2.19. In this grading system, the lowest CGPA was 2.0 while the highest was 4.0.
Research proposal has been revised and approved by research supervisor, Ms Annie Margaret a/p Sandela Raran before the distribution of questionnaire. Besides, Statistical Package for Social Science (SPSS) was learned from SPSS tutorial class organized by senior, William Hooi on 29th November, 2012 in D210 counselling room.
Survey was conducted on 30th November, 2012 at 9.00am during the last week of study in order to assess how stressful the students were in the last month whereby they were facing with academic stress such as preparation for examination, submission of assignments and presentation. In addition, they need to handle stress from doing their FYP as final year students. Specifically, this survey was carried out in UTAR lecture hall, IDK 6. The questionnaires were administered to the Year 3 Psychology students. Participation was voluntary and all responses were remained anonymous. The procedure ended at 12.00pm on the same day. All questionnaires were collected successfully from 100 participants. Lastly, the responses were assessed in Chapter IV Findings and Analysis. In Chapter V Discussion and Conclusion, the results were being analyzed.
The academic achievement, item scores and total scores were being measured in descriptive analysis which included the frequency, percentage, mean and standard deviation. Statistical Package for Social Science (SPSS) program was used with statistical method consisted of Pearson Correlation. Correlation analysis was done to examine the relationship between students’ optimism and stress level in current study. Furthermore, correlation analysis was done to determine the relationship between level of optimism and academic achievement. P < 0.05 was known as statistically significant.
Findings and Analysis
The findings and analysis of current study focused on two major parts which included descriptive statistics and inferential statistics. For descriptive statistics, the statistical result of participants’ academic achievement, item scores and total scores were presented in frequency, percentage, mean and standard deviation. For inferential statistics, the statistical results of independent variable (optimism) and dependent variables (stress and academic achievement) had been computed with statistical method such as Pearson correlation.
Descriptive statistics for participants’ academic achievement, optimism and stress
CGPA % Optimism % Stress %
The basic difference between the objectives of data summarization and data reduction depends upon the ultimate research question. In data summarization the ultimate research question may be to better understand the interrelationship among the variables. This may be accomplished by condensing a large number of respondents into a smaller number of distinctly different groups with Q-type factor analysis. More often data summarization is applied to variables in R-type factor analysis to identify the dimensions that are latent within a dataset. Data summarization makes the identification and understanding of these underlying dimensions or factors the ultimate research question.
Data reduction relies on the identification of the dimensions as well, but makes use of the discovery of the items that comprise the dimensions to reduce the data to fewer variables that represent the latent dimensions. This is accomplished by either the use of surrogate variables, summated scales, or factor scores. Once the data has been reduced to the fewer number of variables further analysis may become easier to perform and interpret.
(2) HOW CAN FACTOR ANALYSIS HELP THE RESEARCHER IMPROVE THE RESULTS OF OTHER MULTIVARIATE TECHNIQUES?
Factor analysis provides direct insight into the interrelationships among variables or respondents through its data summarizing perspective. This gives the researcher a clear picture of which variables are highly correlated and will act in concert in other analysis. The summarization may also lead to a better understanding of the latent dimensions underlying a research question that is ultimately being answered with another technique. From a data reduction perspective, the factor analysis results allow the formation of surrogate or summated variables to represent the original variables in a way that avoids problems associated with highly correlated variables. In addition, the proper usage of scales can enrich the research process by allowing the measurement and analysis of concepts that require more than single item measures.
(3) WHAT GUIDELINES CAN YOU USE TO DETERMINE THE NUMBER OF FACTORS TO EXTRACT? EXPLAIN EACH BRIEFLY.
The appropriate guidelines utilized depend to some extent upon the research question and what is known about the number of factors that should be present in the data. If the researcher knows the number of factors that should be present, then the number to extract may be specified in the beginning of the analysis by the a priori criterion. If the research question is largely to explain a minimum amount of variance then the percentage of variance criterion may be most important.
When the objective of the research is to determine the number of latent factors underlying a set of variables a combination of criterion, possibly including the a priori and percentage of variance criterion, may be used in selecting the final number of factors. The latent root criterion is the most commonly used technique. This technique is to extract the number of factors having eigenvalues greater than 1. The rationale being that a factor should explain at least as much variance as a single variable. A related technique is the scree test criterion. To develop this test the latent roots (eigenvalues) are plotted against the number of factors in their order of extraction. The resulting plot shows an elbow in the sloped line where the unique variance begins to dominate common variance. The scree test criterion usually indicates more factors than the latent root rule. One of these four criterion for the initial number of factors to be extracted should be specified. Then an initial solution and several trial solutions are calculated. These solutions are rotated and the factor structure is examined for meaning. The factor structure that best represents the data and explains an acceptable amount of variance is retained as the final solution.
(4) HOW DO YOU USE THE FACTOR-LOADING MATRIX TO INTERPRET
THE MEANING OF FACTORS?
The first step in interpreting the factor-loading matrix is to identify the largest significant loading of each variable on a factor. This is done by moving horizontally across the factor matrix and underlining the highest significant loading for each variable. Once completed for each variable the researcher continues to look for other significant loadings. If there is simple structure, only single significant loadings for each variable, then the factors are labeled. Variables with high factor loadings are considered more important than variables with lower factor loadings in the interpretation phase. In general, factor names will be assigned in such a way as to express the variables which load most significantly on the factor.
(5) HOW AND WHEN SHOULD YOU USE FACTOR SCORES IN CONJUNCTION WITH OTHER MULTIVARIATE STATISTICAL TECHNIQUES?
When the analyst is interested in creating an entirely new set of a smaller number of composite variables to replace either in part or completely the original set of variables, then the analyst would compute factor scores for use as such composite variables. Factor scores are composite measures for each factor representing each subject. The original raw data measurements and the factor analysis results are utilized to compute factor scores for each individual. Factor scores may replicate as easily as a summated scale, therefore this must be considered in their use.
(6) WHAT ARE THE DIFFERENCES BETWEEN FACTOR SCORES AND SUMMATED SCALES? WHEN ARE EACH MOST APPROPRIATE?
The key difference between the two is that the factor score is computed based on the factor loadings of all variables loading on a factor, whereas the summated scale is calculated by combining only selected variables. Thus, the factor score is characterized by not only the variables that load highly on a factor, but also those that have lower loadings. The summated scale represents only those variables that load highly on the factor.
Although both summated scales and factor scores are composite measures there are differences that lead to certain advantages and disadvantages for each method. Factor scores have the advantage of representing a composite of all variables loading on a factor. This is also a disadvantage in that it makes interpretation and replication more difficult. Also, factor scores can retain orthogonality whereas summated scales may not remain orthogonal. The key advantage of summated scales is, that by including only those variables that load highly on a factor, the use of summated scales makes interpretation and replication easier. Therefore, the decision rule would be that if data are used only in the original sample or orthogonality must be maintained, factor scores are suitable. If generalizability or transferability is desired then summated scales are preferred.
(7) WHAT IS THE DIFFERENCE BETWEEN Q-TYPE FACTOR ANALYSIS AND CLUSTER ANALYSIS?
Both Q-Type factor analysis and cluster analysis compare a series of responses to a number of variables and place the respondents into several groups. The difference is that the resulting groups for a Q-type factor analysis would be based on the intercorrelations between the means and standard deviations of the respondents. In a typical cluster analysis approach, groupings would be based on a distance measure between the respondents’ scores on the variables being analyzed.
(8) WHEN WOULD THE RESEARCHER USE AN OBLIQUE ROTATION INSTEAD OF AN ORTHOGONAL ROTATION? WHAT ARE THE BASIC DIFFERENCES BETWEEN THEM?
In an orthogonal factor rotation, the correlation between the factor axes is arbitrarily set at zero and the factors are assumed to be independent. This simplifies the mathematical procedures. In oblique factor rotation, the angles between axes are allowed to seek their own values, which depend on the density of variable clusterings. Thus, oblique rotation is more flexible and more realistic (it allows for correlation of underlying dimensions) than orthogonal rotation although it is more demanding mathematically. In fact, there is yet no consensus on a best technique for oblique rotation.
When the objective is to utilize the factor results in a subsequent statistical analysis, the analyst may wish to select an orthogonal rotation procedure. This is because the factors are orthogonal (independent) and therefore eliminate collinearity. However, if the analyst is simply interested in obtaining theoretically meaningful constructs or dimensions, the oblique factor rotation may be more desirable because it is theoretically and empirically more realistic.
Multiple Regression Analysis
ANSWERS TO QUESTIONS
(1) HOW WOULD YOU EXPLAIN THE “RELATIVE IMPORTANCE” OF THE PREDICTOR VARIABLES USED IN A REGRESSION EQUATION?
Two approaches: (a) beta coefficients and (b) the order that variables enter the equation in stepwise regression. Either approach must be used cautiously, being particularly concerned with the problems caused by multi-collinearity.
With regard to beta coefficients, they are the regression coefficients which are derived from standardized data. Their value is basically that we no longer have the problem of different units of measure. Thus, they reflect the impact on the criterion variable of a change of one standard deviation in any predictor variable. They should be used only as a guide to the relative importance of the predictor variables included in your equation, and only over the range of sample data included.
When using stepwise regression, the partial correlation coefficients are used to identify the sequence in which variables will enter the equation and thus their relative contribution.
(2) WHY IS IT IMPORTANT TO EXAMINE THE ASSUMPTION OF LINEARITY WHEN USING REGRESSION?
The regression model is constructed with the assumption of a linear relationship among the predictor variables. This gives the model the properties of additivity and homogeneity. Hence coefficients express directly the effect of changes in predictor variables. When the assumption of linearity is violated, a variety of conditions can occur such as multicollinearity, heteroscedasticity, or serial correlation (due to non-independence or error terms). All of these conditions require correction before statistical inferences of any validity can be made from a regression equation.
Basically, the linearity assumption should be examined because if the data are not linear, the regression results are not valid.
(3) HOW CAN NONLINEARITY BE CORRECTED OR ACCOUNTED FOR IN THE REGRESSION EQUATION?
Nonlinearity may be corrected or accounted for in the regression equation by three general methods. One way is through a direct data transformation of the original variable as discussed in Chapter 2. Two additional ways are to explicitly model the nonlinear relationship in the regression equation through the use of polynomials and/or interaction terms. Polynomials are power transformations that may be used to represent quadratic, cubic, or higher order polynomials in the regression equation. The advantage of polynomials over direct data transformations in that polynomials allow testing of the type of nonlinear relationship. Another method of representing nonlinear relationships is through the use of an interaction or moderator term for two independent variables. Inclusion of this type of term in the regression equation allows for the slope of the relationship of one independent variable to change across values of a second dependent variable.
(4) COULD YOU FIND A REGRESSION EQUATION THAT WOULD BE ACCEPTABLE AS STATISTICALLY SIGNIFICANT AND YET OFFER NO ACCEPTABLE INTERPRETATIONAL VALUE TO MANAGEMENT?
Yes. For example, with a sufficiently large sample size you could obtain a significant relationship, but a very small coefficient of determination-too small to be of value.
In addition, there are some basic assumptions associated with the use of the regression model, which if violated, could make any obtained results at best spurious. One of the assumptions is that the conditions and relationships existing when sample data were obtained remain unchanged. If changes have occurred they should be accommodated before any new inferences are made. Another is that there is a “relevant range” for any regression model. This range is determined by the predictor variable values used to construct the model. In using the model, predictor values should fall within this relevant range. Finally, there are statistical considerations. For example, the effects of multicollinearity among predictor variables is one such consideration.
(5) WHAT IS THE DIFFERENCE IN INTERPRETATION BETWEEN THE REGRESSION COEFFICIENTS ASSOCIATED WITH INTERVAL SCALED PREDICTOR VARIABLES AS OPPOSED TO DUMMY (0,1) PREDICTOR VARIABLES?
The use of dummy variables in regression analysis is structured so that there are (n-1) dummy variables included in the equation (where n = the number of categories being considered). In the dichotomous case, then, since n = 2, there is one variable in the equation. This variable has a value of one or zero depending on the category being expressed (e.g., male = 0, female = 1). In the equation, the dichotomous variable will be included when its value is one and omitted when its value is zero. When dichotomous predictor variables are used, the intercept (constant) coefficient (bo) estimates the average effect of the omitted dichotomous variables. The other coefficients, b1 through bk, represent the average differences between the omitted dichotomous variables and the included dichotomous variables. These coefficients (b1-bk) then, represent the average importance of the two categories in predicting the dependent variable.
Coefficients bo through bk serve a different function when metric predictors are used. With metric predictors, the intercept (bo) serves to locate the point where the regression equation crosses the Y axis, and the other coefficients (b1-bk) indicate the effect on the predictor variable(s) on the criterion variable (if any).
(6) WHAT ARE THE DIFFERENCES BETWEEN INTERACTIVE AND CORRELATED PREDICTOR VARIABLES? DO ANY OF THESE DIFFERENCES AFFECT YOUR INTERPRETATION OF THE REGRESSION EQUATION?
The term interactive predictor variable is used to describe a situation where two predictor variables’ functions intersect within the relevant range of the problem. The effect of this interaction is that over part of the relevant range one predictor variable may be considerably more important than the other; but over another part of the relevant range the second predictor variable may become the more important. When interactive effects are encountered, the coefficients actually represent averages of effects across values of the predictors rather than a constant level of effect. Thus, discrete ranges of influence can be misinterpreted as continuous effects.
When predictor variables are highly correlated, there can be no real gain in adding both of the variables to the predictor equation. In this case, the predictor with the highest simple correlation to the criterion variable would be used in the predictive equation. Since the direction and magnitude of change is highly related for the two predictors, the addition of the second predictor will produce little, if any, gain in predictive power.
When correlated predictors exist, the coefficients of the predictors are a function of their correlation. In this case, little value can be associated with the coefficients since we are speaking of two simultaneous changes.
(7) ARE INFLUENTIAL CASES ALWAYS TO BE OMITTED? GIVE EXAMPLES OF WHEN THEY SHOULD AND SHOULD NOT BE OMITTED?
The principal reason for identifying influential observations is to address one question: Are the influential observations valid representations of the population of interest? Influential observations, whether they be “good” or “bad,” can occur because of one of four reasons. Omission or correction is easily decided upon in one case, the case of an observation with some form of error (e.g., data entry).
However, with the other causes, the answer is not so obvious. A valid but exceptional observation may be excluded if it is the result of an extraordinary situation. The researcher must decide if the situation is one which can occur among the population, thus a representative observation. In the remaining two instances (an ordinary observation exceptional in its combination of characteristics or an exceptional observation with no likely explanation), the researcher has no absolute guidelines. The objective is to assess the likelihood of the observation occurring in the population. Theoretical or conceptual justification is much preferable to a decision based solely on empirical considerations.
Multiple Discriminant Analysis
ANSWERS TO QUESTIONS
(1) HOW WOULD YOU DIFFERENTIATE BETWEEN MULTIPLE DISCRIMINANT ANALYSIS, REGRESSION ANALYSIS, AND ANALYSIS OF VARIANCE?
Basically, the difference lies in the number of independent and dependent variables and in the way in which these variables are measured. Note the following definitions:
Multiple discriminant analysis (MDA) – the single dependent (criterion) variable is nonmetric and the independent (predictor) variables are metric.
Regression Analysis – both the single dependent variable and the multiple independent variables are metric.
Analysis of Variance (ANOVA) – the multiple dependent variables are metric and the single independent variable is nonmetric.
(2) WHEN WOULD YOU EMPLOY LOGISTIC REGRESSION RATHER THAN DISCRIMINANT ANALYSIS? WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF THE DECISION?
Both discriminant analysis and logistic regression are appropriate when the dependent variable is categorical and the independent variables are metric. In the case of a two-group dependent variable either technique might be applied, but only discriminant analysis is capable of handling more than two groups. When the basic assumptions of both methods are met, each gives comparable predictive and classificatory results and employs similar diagnostic measures. Logistic regression has the advantage of being less affected than discriminant analysis when the basic assumptions of normality and equal variance are not met. It also can accommodate nonmetric dummy-coded variables as independent measures. Logistic regression is limited though to the prediction of only a two-group dependent measure. Thus, when more than two groups are involved, discriminant analysis is required.
(3) WHAT CRITERIA COULD YOU USE IN DECIDING WHETHER TO STOP A DISCRIMINANT ANALYSIS AFTER ESTIMATING THE DISCRIMINANT FUNCTION(S)? AFTER THE INTERPRETATION STAGE?
a. Criterion for stopping after derivation. The level of significance must be assessed. If the function is not significant at a predetermined level (e.g., .05), then there is little justification for going further. This is because there is little likelihood that the function will classify more accurately than would be expected by randomly classifying individuals into groups (i.e., by chance).
b. Criterion for stopping after interpretation. Comparison of “hit-ratio” to some criterion. The minimum acceptable percentage of correct classifications usually is predetermined.
(4) WHAT PROCEDURE WOULD YOU FOLLOW IN DIVIDING YOUR SAMPLE INTO ANALYSIS AND HOLDOUT GROUPS? HOW WOULD YOU CHANGE THIS PROCEDURE IF YOUR SAMPLE CONSISTED OF FEWER THAN 100 INDIVIDUALS OR OBJECTS?
When selecting individuals for analysis and holdout groups, a proportionately stratified sampling procedure is usually followed. The split in the sample typically is arbitrary (e.g., 50-50 analysis/hold-out, 60-40, or 75-25) so long as each “half” is proportionate to the entire sample.
There is no minimum sample size required for a sample split, but a cut-off value of 100 units is often used. Many researchers would use the entire sample for analysis and validation if the sample size were less than 100. The result is an upward bias in statistical significance which should be recognized in analysis and interpretation.
(5) HOW DO YOU DETERMINE THE OPTIMUM CUTTING SCORE?
a. For equal group sizes, the optimum cutting score is defined by:
ZA + ZB
ZCE = ———-
ZCE =critical cutting score value for equal size groups
ZA = centroid for group A
ZB = centroid for Group B
N = total sample size
b. For unequal group sizes, the optimum cutting score is defined by:
NAZA + NBZB
ZCU = ————
NA + NB
ZCU =critical cutting score value for unequal size groups
NA = sample size for group A
NB = sample size for Group B
(6) HOW WOULD YOU DETERMINE WHETHER OR NOT THE CLASSIFICATION ACCURACY OF THE DISCRIMINANT FUNCTION IS SUFFICIENTLY HIGH RELATIVE TO CHANCE CLASSIFICATION?
Some chance criterion must be established. This is usually a fairly straight-forward function of the classifications used in the model and of the sample size. The authors then suggest the following criterion: the classification accuracy (hit ratio) should be at least 25 percent greater than by chance.
Another test would be to use a test of proportions to examine for significance between the chance criterion proportion and the obtained hit-ratio proportion.
(7) HOW DOES A TWO-GROUP DISCRIMINANT ANALYSIS DIFFER FROM A THREE-GROUP ANALYSIS?
In many cases, the dependent variable consists of two groups or classifications, for example, male versus female. In other instances, more than two groups are involved, such as a three-group classification involving low, medium, and high classifications. Discriminant analysis is capable of handling either two groups or multiple groups (three or more). When two classifications are involved, the technique is referred to as two-group discriminant analysis. When three or more classifications are identified, the technique is referred to as multiple discriminant analysis.
(8) WHY SHOULD A RESEARCHER STRETCH THE LOADINGS AND CENTROID DATA IN PLOTTING A DISCRIMINANT ANALYSIS SOLUTION?
Plots are used to illustrate the results of a multiple discriminant analysis. By using the statistically significant discriminant functions, the group centroids can be plotted in the reduced discriminant function space so as to show the separation of the groups. Plots are usually produced for the first two significant functions. Frequently, plots are less than satisfactory in illustrating how the groups differ on certain variables of interest to the researcher. In this case stretching the discriminant loadings and centroid data, prior to plotting the discriminant function, aids in detecting and interpreting differences between groups. Stretching the discriminant loadings by considering the variance contributed by a variable to the respective discriminant function gives the researcher an indication of the relative importance of the variable in discriminating among the groups. Group centroids can be stretched by multiplying by the approximate F-value associated with each of the discriminant functions. This stretches the group centroids along the axis in the discriminant plot that provides more of the accounted-for variation.
(9) HOW DO LOGISTIC REGRESSION AND DISCRIMINANT ANALYSES EACH HANDLE THE RELATIONSHIP OF THE DEPENDENT AND INDEPENDENT VARIABLES?
Discriminant analysis derives a variate, the linear combination of two or more independent variables that will discriminate best between the dependent variable groups. Discrimination is achieved by setting variate weights for each variable to maximize between group variance. A discriminant (z) score is then calculated for each observation. Group means (centroids) are calculated and a test of discrimination is the distance between group centroids.
Logistic regression forms a single variate more similar to multiple regression. It differs from multiple regression in that it directly predicts the probability of an event occurring. To define the probability, logistic regression assumes the relationship between the independent and dependent variables resembles an S-shaped curve. At very low levels of the independent variables, the probability approaches zero. As the independent variable increases, the probability increases. Logistic regression uses a maximum likelihood procedure to fit the observed data to the curve.
(10) WHAT ARE THE DIFFERENCES IN ESTIMATION AND INTERPRETATION BETWEEN LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS?
Estimation of the discriminant variate is based on maximizing between group variance. Logistic regression is estimated using a maximum likelihood technique to fit the data to a logistic curve. Both techniques produce a variate that gives information about which variables explain the dependent variable or group membership. Logistic regression may be comfortable for many to interpret in that it resembles the more commonly seen regression analysis.
(11) EXPLAIN THE CONCEPT OF ODDS AND WHY IT IS USED IN PREDICTING PROBABILITY IN A LOGISTIC REGRESSION PROCEDURE.
One of the primary problems in using any predictive model to estimate probability is that is it difficult to “constrain” the predicted values to the appropriate range. Probability values should never be lower than zero or higher than one. Yet we would like for a straight-forward method of estimating the probability values without having to utilize some form of nonlinear estimation. The odds ratio is a way to express any probability value in a metric value which does not have inherent upper and lower limits. The odds value is simply the ratio of the probability of being in one of the groups divided by the probability of being in the other group. Since we only use logistic regression for two-group situations, we can always calculate the odds ratio knowing just one of the probabilities (since the other probability is just 1 minus that probability). The odds value provides a convenient transformation of a probability value into a form more conducive to model estimation.
ANSWERS TO QUESTIONS
(1) WHAT ARE THE BASIC STAGES IN THE APPLICATION OF CLUSTER ANALYSIS?
Partitioning – the process of determining if and how clusters may be developed.
Interpretation – the process of understanding the characteristics of each cluster and developing a name or label that appropriately defines its nature.
Profiling – stage involving a description of the characteristics of each cluster to explain how they may differ on relevant dimensions.
(2) WHAT IS THE PURPOSE OF CLUSTER ANALYSIS AND WHEN SHOULD IT BE USED INSTEAD OF FACTOR ANALYSIS?
Cluster analysis is a data reduction technique that’s primary purpose is to identify similar entities from the characteristics they possess. Cluster analysis identifies and classifies objects or variables so that each object is very similar to others in its cluster with respect to some predetermined selection criteria.
As you may recall, factor analysis is also a data reduction technique and can be used to combine or condense large numbers of people into distinctly different groups within a larger population (Q factor analysis).
Factor analytic approaches to clustering respondents are based on the intercorrelations between the means and standard deviations of the respondents resulting in groups of individuals demonstrating a similar response pattern on the variables included in the analysis. In a typical cluster analysis approach, groupings are devised based on a distance measure between the respondent’s scores on the variables being analyzed.
Cluster analysis should then be employed when the researcher is interested in grouping respondents based on their similarity/dissimilarity on the variables being analyzed rather than obtaining clusters of individuals who have similar response patterns.
(3) WHAT SHOULD THE RESEARCHER CONSIDER WHEN SELECTING A SIMILARITY MEASURE TO USE IN CLUSTER ANALYSIS?
The analyst should remember that in most situations, different distance measures lead to different cluster solutions; and it is advisable to use several measures and compare the results to theoretical or known patterns. Also, when the variables have different units, one should standardize the data before performing the cluster analysis. Finally, when the variables are intercorrelated (either positively or negatively), the Mahalanobis distance measure is likely to be the most appropriate because it adjusts for intercorrelations and weighs all variables equally.
(4) HOW DOES THE RESEARCHER KNOW WHETHER TO USE HIERARCHICAL OR NONHIERARCHICAL CLUSTER TECHNIQUES? UNDER WHICH CONDITIONS WOULD EACH APPROACH BE USED?
The choice of a hierarchical or nonhierarchical technique often depends on the research problem at hand. In the past, hierarchical clustering techniques were more popular with Ward’s method and average linkage being probably the best available. Hierarchical procedures do have the advantage of being fast and taking less computer time, but they can be misleading because undesirable early combinations may persist throughout the analysis and lead to artificial results. To reduce this possibility, the analyst may wish to cluster analyze the data several times after deleting problem observations or outlines.
However, the K-means procedure appears to be more robust than any of the hierarchical methods with respect to the presence of outliers, error disturbances of the distance measure, and the choice of a distance measure. The choice of the clustering algorithm and solution characteristics appears to be critical to the successful use of CA.
If a practical, objective, and theoretically sound approach can be developed to select the seeds or leaders, then a nonhierarchical method can be used. If the analyst is concerned with the cost of the analysis and has an a priori knowledge as to initial starting values or number of clusters, then a hierarchical method should be employed.
Punj and Stewart (1983) suggest a two-stage procedure to deal with the problem of selecting initial starting values and clusters. The first step entails using one of the hierarchical methods to obtain a first approximation of a solution. Then select candidate number of clusters based on the initial cluster solution, obtain centroids, and eliminate outliers. Finally, use an iterative partitioning algorithm using cluster centroids of preliminary analysis as starting points (excluding outliers) to obtain a final solution.
Punj, Girish and David Stewart, “Cluster Analysis in Marketing Research: Review and Suggestions for Application,” Journal of Marketing Research, 20 (May 1983), pp. 134-148.
(5) HOW CAN YOU DECIDE HOW MANY CLUSTERS TO HAVE IN YOUR SOLUTION?
Although no standard objective selection procedure exists for determining the number of clusters, the analyst may