A colleague wants to develop a ''typology'' of people's risk taking behaviour based on their responses to a questionnaire which present a range of likert scale items designed to measure attitude to risk across a variety of situations.
The idea is - person fills in questionnaire: this is compared to the average responses from a group of respondents and then they receive a 'report' / 'diagnoses' of which type they are in relation to risk taking.
This is not necessarily intended as a fully scientific exercise at this time but rather as a proof of concept that would involve much more validation etc before the final version was produced.
My colleague considers that they can show hoe use factor analysis in this process but I am not sure how..seems more like cluster analysis would be appropriate.
I would welcome any thoughts on how to move form questionnaire results to 'report / typology' in SPSS and particularly whether factor analysis or cluster analysis might be appropriate.
Thanks a lot
|
A few points:
(1) Your "colleague" should acknowledge what
is the scholarly
and theoretical basis for conceptualizing
people's behavior as
"types". If your colleague doesn't know
the literature in this
area, he/she/it should probably read it and
think about what
it means since the theoretical position being
taken will influence
what type of analysis is to be
done.
(2) Point (1) above notwithstanding, it kinda
sounds like what
your "colleague" is trying to do comes out of
the "Q methodology"
tradition. I assume that your colleague
is familiar with this tradition
because he/she/it suggests that factor
analysis can be used to form
groups of respondents on some basis of their
risk taking
(this is somewhat odd given the
decision-making research in
cognitive psychology and elsewhere that
identify how features
of a situation make people either
risk-tolerant or risk-averse,
that is, people don't reflexively respond
according to "type"
but on their interpretation of information
provided in a situations).
Q factor analysis has been traditionally
defined as factoring
a correlation matrix based on correlations
between people in
contrast to R factor analysis which
traditionally focuses on
correlations between items (this is the type
of factor analysis
that is most commonly taught and used).
Perhaps your colleague
should take a look at the Sage little green
book on Q methodology;
parts of it are available on books.google.com
at:
Presumably your colleague thinks that the Q
factor analysis will
allow one to use the resulting factor scores
in correlational or
other types of analyses after the factors have
been interpreted
(e.g., "risk averse", "risk tolerant", risk
insensitive, etc.).
However, as in many areas in statistics and
research methods,
there are different "factions" who believe
that only they have the
one "true" perspective on how something should
be done (like
people's positions on null hypothesis testing)
and so it is with
Q methodology. For a review of these
differences (basically,
Stephenson vs But, Thurstone, Kerlinger and
the rest of the world)
see the "Primer" on Q methodology
at:
(3) If I am not mistaken, I think a more
up-to-date statistical
modeling approach is given by latent class
analysis which, oddly
enough, has its own Sage little green
book (thought the cover
on google looks blue); see:
The author of this book McCutcheon does not
refer to Q methodology
but the connection between the two is made in
the following article:
Kroesen, M., Molin, E. J., & van
Wee, B. (2011). Policy, personal
dispositions and the evaluation of aircraft noise.
Journal of
environmental psychology, 31(2),
147-157.
which is available
at:
I could be completely
wrong about this, in which case I hope that someone more knowledgeable makes the appropriate
corrections. Otherwise, HTH.
-Mike Palij
New York University
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you for a most 'helpful' reply. Many thanks.
|
In reply to this post by Mike
It may be that what is wanted is to reduce the number of variables to a smaller fairly uncorrelated variables.
Are the items designed to measure some constructs specifically? or to see if there are bunches of items that "go together"? Do the items represent a set of situations so that individuals may have some pattern of consistency across situations (analogous to the way personality traits are based on consistency across situations)? In many contexts R factor analysis has been used to create scales and then the scale scores used in various heuristics techniques of clustering cases into groups with similar profiles. Q factor analysis can be used as one of of those heuristic techniques. (since the mid-70s, I have advocated using several heuristic techniques and using their consensus to create the profiles to interpret.) List members could give better advice if you were to describe the situation in more detail? Who are the respondents? How were they chosen? How many are there? Are there pre-defined subgroups? What are stimulus parts of questions? How were they chosen? Are the items in previously used and validated summative scales? What is your response scale? Is this study experimental, quasi-experimental, exploratory? Is the data already gathered by you or is it gathered by another source. etc.?
Art Kendall
Social Research Consultants |
Hi Art
It is an exploratory exercise. The scale items used in the questionnaire have been widely used and published. The object is to construct a 'typology' of people (general pop) regarding risk taking behaviour mainly in the context of work/business. Respondents complete a questionnaire (of scale items regarding various apsects of 'risk taking') and (it is hoped)they then get a report classifying them into one of (say) seven 'types'. Some apps/sites like this exist already but not for risk taking. e.g. http://www.people-press.org/quiz/political-typology/ The data collection has not yet been carried out but would probably use a convenience sample as it is not funded research at this time - as I said more to get a proof of concept and understand how a fuller project might work. As I said the main issue they are struggling with is whether tyou would use fcator analysis in whatever form or cluster analysis to classify a particular respondent into one of the #types'. Seems more like a case for cluster analysis to me but still not sure how it would work. I know when you do cluster analysts you can get SPSS to create a 'score' for each person based on the cluster model but not clear how you would then use that to classify someone into a 'type. Regards |
If the items are parts of scales that are summative (means or totals) and balanced in direction, etc., then the first step would be to check that you are using a correct scoring key. A scoring key tells you which items go into a score and which items need to be reflected.
An R factor analysis is one rough check. It can be thought of as creating piles/groups/heaps/sets of items that 'go together'. RELIABILITY is a further check. You would then create scores within cases (ordinary SPSS transformations). You could then try various similarity measures and grouping heuristics. These approaches create piles/groups/heaps/sets of casesthat 'go together'. One heuristic is Q factor analysis. Each approach results in a new variable that is nominal. (So not a "score" in the sense of being at least ordinal.) Since the mid-70's my habit has been to use several approaches to grouping cases and then use crosstabs to create core cluster assignments. I then use DISCRIMINANT iteratively to decide the final profiles. If you have too many thousands of cases, you need to tweak the approaches, Sampling? TWOSTEP, fit indices over a set of k-means solutions. I have not had an occasion to have a clustering project since I retired but have wondered whether TWOSTEP with the cluster assignments from the different approaches as nominal level input would be a way to create the core profiles.
Art Kendall
Social Research Consultants |
Hi Art thanks for your reply.
The direction of scale issue is dealt with and reliability also (all groups had very high cronbach alpha 0.8, 0.9 etc). You mentioned creating scores within cases : I am a little unsure how that would work. I know that when you do cluster analysis or factor analysis you can create a set of variables which contain a value for each case (e.g. a value against cluster1, cluster 2, cluster 3 or factors, 1, 2, 3). The bit I don't understand and so can't advise colleague on, is how you would apply that information to classify a particular respondent into a cluster or factor/s. I have exported the model in XML using cluster analysis but still not sure how this would work in the context of individual respondents who complete a questionnaire and who we then want to 'classify' within a simple typology (i.e. most resemble type 1, 2 or 3). Thanks for your input - very helpful |
I note the 'scoring wizard' under utilities menu in SPSS. I think this would be one solution : do cluster analysis, then export model in XML, import and apply to new dataset using scoring wizard. I wonder if there is an inequivalent way for factor analysis....
|
Do you anticipate having thousands of cases?
What would be the circumstances for having different data sets? For R factor analysis the scoring key is what is used across situations. (If you are in the US think of the SATs, ACTs, MCAT, GRE, etc.) There is insufficient information to advise on using the scoring wizard. It may be that if you use DISCRIMINANT to refine the core cluster profiles you could use that for scoring other sets of cases. But it is most likely too early in a research program to be looking at that.
Art Kendall
Social Research Consultants |
Art,
I think that the OP has to be clearer on a few additional points: (1) Is the search for "types" theory-driven or purely data-driven (i.e., some form of "dustbowl empiricism"). If the former, which theories are relevant? If the latter, what controls will be used to protect against oddities in the data (e.g., skew, nonnormality, etc.) that will affect correlations and inferential tests. (2) Because (1) is unclear, are "types" (a) categories (i.e., unordered, mutually exclusive groups) or (b) dimensions (either unidimensional or multidimension; a "type" is where a group falls on the unidimensional scale or a cluster of points in multidimensional space). Note that one can take a categorical approach but either use an absolute criterion of membership (e.g., either you're pregnant or you're not) or graded membership (fuzzy set theory allows graded member; for the category "birds" robins may be seen as prototypical while emus and kiwis [not the fruit] are still birds but are atypical). How does the OP see this? (3) If one is mathematically oriented, one might consider the "type" and "category" distinctions in a mathematical context; see: Type Theory https://en.wikipedia.org/wiki/Type_theory Categorical Logic https://en.wikipedia.org/wiki/Categorical_logic One might then consider the topic Statistical Classification https://en.wikipedia.org/wiki/Statistical_classification Just something to think about. -Mike Palij New York University [hidden email] ----- Original Message ----- From: "Art Kendall" <[hidden email]> To: <[hidden email]> Sent: Thursday, June 02, 2016 8:32 AM Subject: Re: ''TYPOLOGY'' from factor analysis > Do you anticipate having thousands of cases? > What would be the circumstances for having different data sets? > > For R factor analysis the scoring key is what is used across > situations. > (If you are in the US think of the SATs, ACTs, MCAT, GRE, etc.) > > There is insufficient information to advise on using the scoring > wizard. > > It may be that if you use DISCRIMINANT to refine the core cluster > profiles > you could use that for scoring other sets of cases. But it is most > likely > too early in a research program to be looking at that. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
The typology will be data driven to a large extent though I don;t think one can ever switch off the ideas absorbed from literature etc but the idea is to let the data speak for itself as far as possible..
As I made clear in my initial post , and part of what i am asking, is whether factor analysis or cluster analysis will be the best way forward The researchers are not mathematically orientated. |
On Thursday, June 02, 2016 11:18 AM, "researcher" wrote:
> The typology will be data driven to a large extent though I don;t > think one > can ever switch off the ideas absorbed from literature etc but the > idea is > to let the data speak for itself as far as possible.. You do understand that different statistical procedures make certain assumptions which may or may not be consistent with one's theoretical assumptions (or conceptual assumption if one is being vague)? Matching the right statistical procedure to theoretical/conceptual entities is not a simple process. > As I made clear in my initial post , and part of what i am asking, is > whether factor analysis or cluster analysis will be the best way > forward Folks who use cluster analysis on a regular basis can correct me if I'm wrong but cluster analysis uses observed correlations to measure the distance between variables or people or both. The basis for why these correlations exist is, I believe, not always clear nor is it clear what role third/unmeasured variables play (which could give rise to spurious correlations). Principal components analysis (a "type" of factor analysis) is often viewed as a data reduction technique which attempts to derive components (factors but this usage is confusing for this "type" of analysis) that accounts for the total variance of the variables. In a sense this is most comparable to cluster analysis but cluster analysis uses correlation to aggregate variables/people that are "close together" in terms of distance as measures by the size of the correlation (higher correlation = closer together) while principal components attempts to find the rank of the correlation matrix (i.e., a smaller square matrix with fewer rows and columns then the original correlation matrix; if one analyzes correlations between people the components [e.g., eigenvalues greater than one] may represent different groups of people, people who are similar load on the same components -- a person's components scores identifies which component[s] they most "belong" to). True factor analysis attempts to explain why observed correlations were obtained by latent variables/factors that account for "common variance" or "true variance" -- the Squared Multiple Correlation of a variable with other variables or of a person with other persons is a lower bound for the common variance. The remaining variance is error variance (i.e., does not explain why the correlations are what they are) and the sum of common variance and error variance gives the total variance (which is used in principal components). Stephenson was among the first people to use "Q factor analysis", that is, factor analyzing correlations among people but today people in structural equation modeling (SEM) use the concept of latent classes which are derived from the correlations. If I remember correctly, I believe that in an earlier post I made similar points and provided links to the some of the relevant research literature -- did you read them? > The researchers are not mathematically orientated. Maybe they should not be playing with tools that they don't understand? -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Mike
true.
Art Kendall
Social Research Consultants |
In reply to this post by researcher
We are lacking sufficient information to make strong suggestions.
If you already have psychometrically established scales, then they could be input as the variables for clustering. Scales are frequently bunches/heaps/piles/set of items that are put together via factor analysis. You would do well to have a conversation with a methodologist who is experienced with clustering and related exploratory/heuristic techniques. It is possible that multidimensional scaling or other techniques would be more appropriate. You really need to better specify the circumstances etc.. Are you at a university or other research location? What country are you in there a Classification Societies in several parts of the world.
Art Kendall
Social Research Consultants |
In reply to this post by researcher
You are starting with items. You want to describe the results on two
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
(at most, three) dimensions. So you want to find the dimensions. If you start out with two dimensions, separating results into high/middle/low gives you 9 potential groups: or maybe just 4 or 5, once you remove the ones that have almost no people in them, and combine the ones that look pretty similar on other characteristics. The extreme-extremes may be worth separating out, or adding a third dimension may help pick out two or three more groups. That should be enough groups. Assuming that your items cover the useful universe for whatever outcomes you eventually have in mind, then factor analysis finds the dimensions or "latent factors". Iterate on the communalities and use varimax rotation, and almost no one will complain. For ease of computation and discussion and other reasons, you want score each scale as the simple average of the items that have the strongest loadings on it; and to use each item (if you can) on only one scale. For this sort of items, I expect that you should probably find that you can use a cut-off of 0.35, 0.40, or 0.45 -- An item will have only one loading that strong, and a scale is defined by the items that are that strong. - I am saying "strong" instead of "high" in order to be general. That is, a negative correlation of -0.40 is exactly as strong as the positive correlation of 0.40. For an item with a negative loading, the convention is to reverse the scoring before adding it in. If you are going to do a cluster analysis, do it with the factor scores and not with the items. If you expect one or two items to be /crucial/, you can include them in the variables for the clustering in addition to the factors scores; this will emphasize their importance in a way that would be missed with a clustering of all items. My bias is against clustering. In clinical psychiatry, the movement for 50 years has been /away/ from specific diagnoses (autism; schizophrenia; depression), which you might call "clusters" (which never lived up to their promise) and towards measuring symptoms on dimensions of rating scales. -- Rich Ulrich > Date: Thu, 2 Jun 2016 08:18:27 -0700 > From: [hidden email] > Subject: Re: ''TYPOLOGY'' from factor analysis > To: [hidden email] > > The typology will be data driven to a large extent though I don;t think one > can ever switch off the ideas absorbed from literature etc but the idea is > to let the data speak for itself as far as possible.. > > > As I made clear in my initial post , and part of what i am asking, is > whether factor analysis or cluster analysis will be the best way forward > > > The researchers are not mathematically orientated. > > |
Rich Ulrich:
Thought I would join with the paragraph, I'd say that, purely analytically, a cluster is just collection of points, while a factor is just a variable, dimesion. Neither this nor that is vicious - they are two ways to conceptualize reality, both with their assets and limitations. What is potentially vicious - is to leave too much significance to labels. Labels of clusters, of factors, of cultural events, etc. Labels tend to become essences and live their own simulacrial lives. We should invent labels greely and creatively, but use them in subsequent practice with certain caution. Factors (of factor analysis) can be indeed seen, in a sense, as clusters of correlating variables/items (principal components - cannot); however, factor is a unidimensional "simple" thing which is inside or behind "its" items (to force them to correlate) and not the combination of the items themselves (unlike a cluster). ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Mike
''Maybe they should not be playing with tools that they don't understand? ''
How incredibly patronising.
There are many ways to do research and those who excel in mathematical aspects may well be deficient in others. Personally I have 20+ years of mixed methods research and have not done too badly career wise. The idea that one needs mathematical expertise in order to do quantitative research or use SPSS in research is nonsense.
Some of the responses here are extremely helpful - others do make me wonder whether their replies are about helping others and sharing knowledge and idea or just people wanting to massage their own egos.
If nothing constructive to say, perhaps best to say nothing.
|
In reply to this post by Rich Ulrich
Thanks Rich.
Actually some initial data analysis now done and form initial factor analysis looks like there are about 5 'worthwhile' factors, using scree plot to judge cut off. Oblique rotation was used as no reason to think factors should not be correlated. These five factors seems to replicate patterns in the literature broadly. So if proceeding along this path then one question would be how do they take this 'model' to classify new cases (to 'score' a datset). E.g. person x fills in questionnaire (same as the one used to generate the data for the factor analysis) and the researchers would like to determine which factor/s would best explain that person's responses. Perhaps this can be done using the factor scores, though I don't immediately see how. In SPSS cluster analysis models can be exported in XML and applied to an unscored dataset but this doe snot seem possible for factor analysis, at least not in the same way. It is cases that they wish to score ultimately, not variables so that may mean cluster analysis makes more sense. Could you elaborate just a little on ''Iterate on the communalities ''. I understand what communalities are (just about) but not sure how one iterates on them in SPSS> Thanks |
Administrator
|
In reply to this post by researcher
FWIW, I did not take Mike's comment as patronizing. I think he was simply pointing out that when people use statistical methods they don't understand very well (in a "Real Stats, Real Easy" kind of way), the likelihood of GIGO rises. YMMV.
Cheers, Bruce p.s. - I am reminded of the "Train tickets" story on this web-page: https://www.staff.ncl.ac.uk/s.j.cotterill/Other_bits/Statistical_Funnies/statistical_funnies.html
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
Odd that folks who would hire a plumber to fix their toilet, a carpenter to build their house, a mechanic to repair a car etc... but somehow refuse to consult/hire a statistician to figure out their data ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |