Factor analysis loadings and correlations

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Factor analysis loadings and correlations

Kathryn Gardner
Dear list,
 
Can anyone tell me why orthogonal rotation in PCA would produce 3 factors all with positive loadings, but then oblique rotation of the same data produces 3 factors, one of which has all negative loadings? This means that the correlations amongst the factors from oblique rotation result in some factors that are negatively correlated when they should in fact be positively related (all items are scored and worded in the same direction). Can I just change the negative sign to a positive one for this factor? Any help greatly appreciated as I simply can't find the answer to this.
 
I was also wondering if there is a limit in terms of the correlation size amongst oblique factors? That is, what size is considered too high a correlation to suggest possible factor dependence? Two of my 3 factors correlate above .70 but the scree and kaiser's criterion clearly suggest 3 or even 4 factors.
 
Thanks
Kathryn
_________________________________________________________________
Feel like a local wherever you go with BackOfMyHand.com
http://www.backofmyhand.com
Reply | Threaded
Open this post in threaded view
|

Re: Factor analysis loadings and correlations

Hector Maletta
         Kathryn,
         Rotation is rotation. If you rotate a factor by an angle above 90
degrees, your positive loadings will turn negative, just as you will see
everything upside down if you stand on your head. In oblique rotation, each
factor rotates at a specific angle, and thus some loadings may turn negative
for a specific factor. So it is possible (though not quite frequent) that
ALL loadings for a factor would change sign after rotation.
         Inverting the sign of ALL items will have no effect except
reversing the meaning of the factors (instead of factors for, say,
"intelligence", you will have factors for "imbecility": the same factors
running the other way).
         Now when you have a set of items and extract SEVERAL factors, it is
perfectly possible that one factor is positively correlated with all items
(the main underlying dimension expressed by all items, say intelligence)
while other factors are negatively correlated with all, or with some of the
items (factors representing things alien to intelligence, say tendency to be
bored by tests, or physical problems with sight or motor ability impeding
completion of the test); these factors may be negatively correlated with
some or all the items (which are oriented in this example to measure only
cognitive skills).

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Kathryn Gardner
Sent: 23 July 2007 10:40
To: [hidden email]
Subject: Factor analysis loadings and correlations

         Dear list,

         Can anyone tell me why orthogonal rotation in PCA would produce 3
factors all with positive loadings, but then oblique rotation of the same
data produces 3 factors, one of which has all negative loadings? This means
that the correlations amongst the factors from oblique rotation result in
some factors that are negatively correlated when they should in fact be
positively related (all items are scored and worded in the same direction).
Can I just change the negative sign to a positive one for this factor? Any
help greatly appreciated as I simply can't find the answer to this.

         I was also wondering if there is a limit in terms of the
correlation size amongst oblique factors? That is, what size is considered
too high a correlation to suggest possible factor dependence? Two of my 3
factors correlate above .70 but the scree and kaiser's criterion clearly
suggest 3 or even 4 factors.

         Thanks
         Kathryn
         _________________________________________________________________
         Feel like a local wherever you go with BackOfMyHand.com
         http://www.backofmyhand.com
Reply | Threaded
Open this post in threaded view
|

Test Samples

<R. Abraham>
I have a question regarding comparing several Test Samples to a Control
Sample.

We have 3 different Test Samples with different mail quantities:
200K
100K
50K

We want to compare the results of different Test Samples to a Control
Sample. My question is should we need to have 3 different Control Samples
with the same quantities as the Test Samples (i.e. 200K, 100K & 50K) or Is
it sufficient to have just 1 control sample of 350K?

Option 1 - (Test 200K vs 200K; 100K vs 100K & 50K vs 50K)
Test    Control
200K    200K
100K    100K
50K     50K

or

Option 2 - (Test 200K vs 350K; 100K vs 350K & 50K vs 350K)
Test    Control
200K    350K
100K
50K

Is there any issues regarding confidence interval, significance, etc.? I
would really appreciate if somebody can provide some answers.

Thanks.

R Abraham
Reply | Threaded
Open this post in threaded view
|

Re: Test Samples

Spousta Jan
Hi,

It depends on how the samples were drawn.

* If all samples are generated from the same pool of addresses using the same method, you can have only one common control group

* If the samples are from different pools (e.g. different regions, different vendors of databases...) or you use different methods/criterias for selection/sampling (e.g. types of neighborhood...), you should have three different controls.

Moreover, there is generally no need to have the test and control groups of exactly the same size. I often use control groups of 10K customers or so because of opportunity costs connected with the control groups.

Regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of <R. Abraham>
Sent: Monday, July 23, 2007 4:34 PM
To: [hidden email]
Subject: Test Samples

I have a question regarding comparing several Test Samples to a Control Sample.

We have 3 different Test Samples with different mail quantities:
200K
100K
50K

We want to compare the results of different Test Samples to a Control Sample. My question is should we need to have 3 different Control Samples with the same quantities as the Test Samples (i.e. 200K, 100K & 50K) or Is it sufficient to have just 1 control sample of 350K?

Option 1 - (Test 200K vs 200K; 100K vs 100K & 50K vs 50K)
Test    Control
200K    200K
100K    100K
50K     50K

or

Option 2 - (Test 200K vs 350K; 100K vs 350K & 50K vs 350K)
Test    Control
200K    350K
100K
50K

Is there any issues regarding confidence interval, significance, etc.? I would really appreciate if somebody can provide some answers.

Thanks.

R Abraham



_____

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

-.- --
Reply | Threaded
Open this post in threaded view
|

Re: Factor analysis loadings and correlations

Kathryn Gardner
In reply to this post by Kathryn Gardner
Thanks Hector for your response. I follow now. I  had a feeling I could just reverse the sign/meaning of the factor.
 
I was wondering if it is normal for oblique factors to correlate highly i.e., >.70, and if they do does this suggest that there are too many factors and these two factors should be reduced to one?
 
Thanks
Kathryn> Date: Mon, 23 Jul 2007 11:09:21 -0300> From: [hidden email]> Subject: Re: Factor analysis loadings and correlations> To: [hidden email]> > Kathryn,> Rotation is rotation. If you rotate a factor by an angle above 90> degrees, your positive loadings will turn negative, just as you will see> everything upside down if you stand on your head. In oblique rotation, each> factor rotates at a specific angle, and thus some loadings may turn negative> for a specific factor. So it is possible (though not quite frequent) that> ALL loadings for a factor would change sign after rotation.> Inverting the sign of ALL items will have no effect except> reversing the meaning of the factors (instead of factors for, say,> "intelligence", you will have factors for "imbecility": the same factors> running the other way).> Now when you have a set of items and extract SEVERAL factors, it is> perfectly possible that one factor is positively correlated with all items> (the main underlying dimension expressed by all items, say intelligence)> while other factors are negatively correlated with all, or with some of the> items (factors representing things alien to intelligence, say tendency to be> bored by tests, or physical problems with sight or motor ability impeding> completion of the test); these factors may be negatively correlated with> some or all the items (which are oriented in this example to measure only> cognitive skills).> > Hector> > -----Original Message-----> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of> Kathryn Gardner> Sent: 23 July 2007 10:40> To: [hidden email]> Subject: Factor analysis loadings and correlations> > Dear list,> > Can anyone tell me why orthogonal rotation in PCA would produce 3> factors all with positive loadings, but then oblique rotation of the same> data produces 3 factors, one of which has all negative loadings? This means> that the correlations amongst the factors from oblique rotation result in> some factors that are negatively correlated when they should in fact be> positively related (all items are scored and worded in the same direction).> Can I just change the negative sign to a positive one for this factor? Any> help greatly appreciated as I simply can't find the answer to this.> > I was also wondering if there is a limit in terms of the> correlation size amongst oblique factors? That is, what size is considered> too high a correlation to suggest possible factor dependence? Two of my 3> factors correlate above .70 but the scree and kaiser's criterion clearly> suggest 3 or even 4 factors.> > Thanks> Kathryn> _________________________________________________________________> Feel like a local wherever you go with BackOfMyHand.com> http://www.backofmyhand.com
_________________________________________________________________
The next generation of MSN Hotmail has arrived - Windows Live Hotmail
http://www.newhotmail.co.uk
Reply | Threaded
Open this post in threaded view
|

Re: Test Samples

<R. Abraham>
In reply to this post by Spousta Jan
Thanks Jan for your quick reply.

Yes all sample are generated from the same pool of addresses using the
same method. So from your reply it should be one common control group.

I thought so about the size of the test and control groups, but for some
reason my supervisor wants to go with equal sizes.

Thanks again!!

R Abraham




Spousta Jan <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/23/2007 12:06 PM
Please respond to
Spousta Jan <[hidden email]>


To
[hidden email]
cc

Subject
Re: Test Samples






Hi,

It depends on how the samples were drawn.

* If all samples are generated from the same pool of addresses using the
same method, you can have only one common control group

* If the samples are from different pools (e.g. different regions,
different vendors of databases...) or you use different methods/criterias
for selection/sampling (e.g. types of neighborhood...), you should have
three different controls.

Moreover, there is generally no need to have the test and control groups
of exactly the same size. I often use control groups of 10K customers or
so because of opportunity costs connected with the control groups.

Regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
<R. Abraham>
Sent: Monday, July 23, 2007 4:34 PM
To: [hidden email]
Subject: Test Samples

I have a question regarding comparing several Test Samples to a Control
Sample.

We have 3 different Test Samples with different mail quantities:
200K
100K
50K

We want to compare the results of different Test Samples to a Control
Sample. My question is should we need to have 3 different Control Samples
with the same quantities as the Test Samples (i.e. 200K, 100K & 50K) or Is
it sufficient to have just 1 control sample of 350K?

Option 1 - (Test 200K vs 200K; 100K vs 100K & 50K vs 50K)
Test    Control
200K    200K
100K    100K
50K     50K

or

Option 2 - (Test 200K vs 350K; 100K vs 350K & 50K vs 350K)
Test    Control
200K    350K
100K
50K

Is there any issues regarding confidence interval, significance, etc.? I
would really appreciate if somebody can provide some answers.

Thanks.

R Abraham



_____

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně
adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno
jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto
informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte
odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za
jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

This message and any attached files are confidential and intended solely
for the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the message as well as all attached documents. The sender does not
accept liability for any errors or omissions as a result of the
transmission.

-.- --
Reply | Threaded
Open this post in threaded view
|

Re: Factor analysis loadings and correlations

Hector Maletta
In reply to this post by Kathryn Gardner
I do not know how usual it is, but it is possible, and the possibility and
its interpretation would depend on the kind of problem you are analyzing.
The result of your Factor Analysis would be that they are two factors, not
one, so the variance of the variables is better accounted for by the two
factors, not by one, and rotation only has the effect of associating more
clearly each factor with some group of variables, and showing (by oblique
rotation) that the two factors are not unrelated to each other. You can
always use the (correlated) factors for further analysis, either applying a
secondary or higher-level factor analysis, or structural equations, or GLM,
or whatever, but you better be guided by the specifics of your problem and
related theory, not by the mere possibilities afforded by statistics.



Hector



  _____

From: Kathryn Gardner [mailto:[hidden email]]
Sent: 23 July 2007 14:37
To: Hector Maletta; [hidden email]
Subject: RE: Factor analysis loadings and correlations



Thanks Hector for your response. I follow now. I  had a feeling I could just
reverse the sign/meaning of the factor.

I was wondering if it is normal for oblique factors to correlate highly
i.e., >.70, and if they do does this suggest that there are too many factors
and these two factors should be reduced to one?

Thanks
Kathryn

> Date: Mon, 23 Jul 2007 11:09:21 -0300
> From: [hidden email]
> Subject: Re: Factor analysis loadings and correlations
> To: [hidden email]
>
> Kathryn,
> Rotation is rotation. If you rotate a factor by an angle above 90
> degrees, your positive loadings will turn negative, just as you will see
> everything upside down if you stand on your head. In oblique rotation,
each
> factor rotates at a specific angle, and thus some loadings may turn
negative

> for a specific factor. So it is possible (though not quite frequent) that
> ALL loadings for a factor would change sign after rotation.
> Inverting the sign of ALL items will have no effect except
> reversing the meaning of the factors (instead of factors for, say,
> "intelligence", you will have factors for "imbecility": the same factors
> running the other way).
> Now when you have a set of items and extract SEVERAL factors, it is
> perfectly possible that one factor is positively correlated with all items
> (the main underlying dimension expressed by all items, say intelligence)
> while other factors are negatively correlated with all, or with some of
the
> items (factors representing things alien to intelligence, say tendency to
be

> bored by tests, or physical problems with sight or motor ability impeding
> completion of the test); these factors may be negatively correlated with
> some or all the items (which are oriented in this example to measure only
> cognitive skills).
>
> Hector
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Kathryn Gardner
> Sent: 23 July 2007 10:40
> To: [hidden email]
> Subject: Factor analysis loadings and correlations
>
> Dear list,
>
> Can anyone tell me why orthogonal rotation in PCA would produce 3
> factors all with positive loadings, but then oblique rotation of the same
> data produces 3 factors, one of which has all negative loadings? This
means
> that the correlations amongst the factors from oblique rotation result in
> some factors that are negatively correlated when they should in fact be
> positively related (all items are scored and worded in the same
direction).

> Can I just change the negative sign to a positive one for this factor? Any
> help greatly appreciated as I simply can't find the answer to this.
>
> I was also wondering if there is a limit in terms of the
> correlation size amongst oblique factors? That is, what size is considered
> too high a correlation to suggest possible factor dependence? Two of my 3
> factors correlate above .70 but the scree and kaiser's criterion clearly
> suggest 3 or even 4 factors.
>
> Thanks
> Kathryn
> _________________________________________________________________
> Feel like a local wherever you go with BackOfMyHand.com
> http://www.backofmyhand.com



  _____

Play Movie Mash-up and win BIG prizes!  <https://www.moviemashup.co.uk>
Reply | Threaded
Open this post in threaded view
|

Missingness on covariate

Dale Glaser
Hi all...I would appreciate any opinions to see if I'm slightly wayward in mine!

  I'm on a project where from a total sample size of n = 129 there are n = 18 (13.95%) missing values for a variable that is an element for deriving a SES-type total score.  The variable that constitues the missingness in computing the total score is 'occupation' and this variable (which can be segmented into 9 strata) does not have a scoring option for housewife/disabled/retired/student.  The total 'social status score' is a simple additive function of: (5 x occupation) + (3 x Education) and if both respondent and partner are employed, you take the average of those scores (or just use the score from the sole 'breadwinner').

  That being said n = 18 are housewives/disabled/retired/students that do not have legitimate occupational scores (per this scoring system) and also may not have a partner from which to derive a score. At first I was thinking that I could at least use the information from education (which we have the full complement of data) and at least obtain a score in that fashion, but it seems abundantly clear that scores are based on occupation AND education, and also it is possible to have a score lower than the lowest boundary value (x = 8).

  Hence, though there are some auxillary variables that have some substantive correlations with the SES-score, and could serve as possible inputs for a ML/EM approach to missing data (e.g, I could use the multiple imputation option in PRELIS that uses EM or MCMC algorithm) my feeling is that the structure of this score demands information for both occupation and education and doesn't give the option of  occupation  OR education, thus by willy-nilly using some likelhood approach to estimate the parameters, it seems to me our research team would be running afoul of the fundamental foundations/definition of this construct (regardless of mechanism of missingness...MCAR, MAR, etc.).

  Any opinions would be appreciated......I hate the thought of losing 13.95% of the data but I am having a hard time justifying a likelihood approach for missing data given the inherent definition of the social status score.

  thank you....Dale


Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/AIU
President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com
Reply | Threaded
Open this post in threaded view
|

Re: Missingness on covariate

Hector Maletta
         I was called to give an opinion recently on a similar problem in a
survey in Argentina.
         First a conceptual thought. I have found that a frequent and
fundamental research-design flaw causing this kind of problem is thinking of
SES as an attribute of individuals instead of households. The occupation of
those individuals you are concerned about is not "missing", strictly
speaking. They simply do not have an occupation personally: they rely on
someone else's occupation (or income source), possibly another household
member (albeit not a spouse/partner). Perhaps they depend on a working son
or daughter (or parent). They may also have a source of income that is
unrelated to occupation (say, income from property such as rent, interest,
dividends), and any such source may accrue to the respondent or to some
other household member. The housewife or disabled person without an
occupation and without a partner may also be living off unilateral transfers
(welfare payments, fellowships, charity, remittances) or off his/her own
savings, i.e. from a gradually diminishing capital stock.
         Second, some approach to a solution. You are right in the sense
that guessing the occupation based on education entangles you in some kind
of circularity. 82% of people are characterized by education and occupation,
and 18% would be characterized by education and a function of education. But
you've got to do something.
         My own proposals would be to estimate the "ghost" (not properly
"missing") occupational status as a function not (only) of education but of
a number of predictors including age, gender, area of residence, and
whatever other background variables you have at your disposal. Then proceed
with the estimated occupational status as if it were the actual occupational
status. Use multiple imputation if possible. Not very clean, but doable. You
may or may not include education among the predictors.
         On the other hand I see your total score is a simple summation that
gives a weight of 5 to occupation and 3 to education. This may be right, but
it may also be totally arbitrary. Why 5 and 3? Five for jumping one
arbitrary stratum in occupational status, and 3 for jumping one year of
education or one educational level? Why not 6 and 4? Or 7 and 2? Or 1 and 1?
If your total score has these arbitrary elements already built in, why
bother with niceties about those "missing" occupational statuses? On the
face of it, I would say don't worry too much: just use education in those
cases, rescale the education score to put it within the min/max boundaries
of the total score, and be done with it.

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dale Glaser
Sent: 23 July 2007 17:42
To: [hidden email]
Subject: Missingness on covariate

         Hi all...I would appreciate any opinions to see if I'm slightly
wayward in mine!

           I'm on a project where from a total sample size of n = 129 there
are n = 18 (13.95%) missing values for a variable that is an element for
deriving a SES-type total score.  The variable that constitues the
missingness in computing the total score is 'occupation' and this variable
(which can be segmented into 9 strata) does not have a scoring option for
housewife/disabled/retired/student.  The total 'social status score' is a
simple additive function of: (5 x occupation) + (3 x Education) and if both
respondent and partner are employed, you take the average of those scores
(or just use the score from the sole 'breadwinner').

           That being said n = 18 are housewives/disabled/retired/students
that do not have legitimate occupational scores (per this scoring system)
and also may not have a partner from which to derive a score. At first I was
thinking that I could at least use the information from education (which we
have the full complement of data) and at least obtain a score in that
fashion, but it seems abundantly clear that scores are based on occupation
AND education, and also it is possible to have a score lower than the lowest
boundary value (x = 8).

           Hence, though there are some auxillary variables that have some
substantive correlations with the SES-score, and could serve as possible
inputs for a ML/EM approach to missing data (e.g, I could use the multiple
imputation option in PRELIS that uses EM or MCMC algorithm) my feeling is
that the structure of this score demands information for both occupation and
education and doesn't give the option of  occupation  OR education, thus by
willy-nilly using some likelhood approach to estimate the parameters, it
seems to me our research team would be running afoul of the fundamental
foundations/definition of this construct (regardless of mechanism of
missingness...MCAR, MAR, etc.).

           Any opinions would be appreciated......I hate the thought of
losing 13.95% of the data but I am having a hard time justifying a
likelihood approach for missing data given the inherent definition of the
social status score.

           thank you....Dale


         Dale Glaser, Ph.D.
         Principal--Glaser Consulting
         Lecturer/Adjunct Faculty--SDSU/USD/AIU
         President, San Diego Chapter of
         American Statistical Association
         3115 4th Avenue
         San Diego, CA 92103
         phone: 619-220-0602
         fax: 619-220-0412
         email: [hidden email]
         website: www.glaserconsult.com
Reply | Threaded
Open this post in threaded view
|

Re: Missingness on covariate

Dale Glaser
Thank you very much for your helpful response Hector...yes, I was thinking of capturing/grabbing any auxillary variables at our disposal (e.g, parents/mothers education, age, etc.) though it felt a bit of a data-driven exercise, and then imputing occupational score.......the occupational score is derived by a 9-level categorical variable and multiplying by a constant obtains a 'respondent occupational score' and 'partner occupational score'.

   (Hector, apparently the constants you refer to below are part of the scaling requirements per the Hollingshead index....though I do see how they could be viewed as arbitrary, I'm assuming the authors had justification for such(?))

  I know there are missing data techniques for categorical variables (Horton & Kelinman, 2007), but in this instance I was thinking of imputing/estimating the scaled score (which is a function of occupation and education), using auxillary variables that were (1) most correlated and (2) could be theoretically justified.

  Again, thank you very much for your helpful insights.

  dale


Hector Maletta <[hidden email]> wrote:
  I was called to give an opinion recently on a similar problem in a
survey in Argentina.
First a conceptual thought. I have found that a frequent and
fundamental research-design flaw causing this kind of problem is thinking of
SES as an attribute of individuals instead of households. The occupation of
those individuals you are concerned about is not "missing", strictly
speaking. They simply do not have an occupation personally: they rely on
someone else's occupation (or income source), possibly another household
member (albeit not a spouse/partner). Perhaps they depend on a working son
or daughter (or parent). They may also have a source of income that is
unrelated to occupation (say, income from property such as rent, interest,
dividends), and any such source may accrue to the respondent or to some
other household member. The housewife or disabled person without an
occupation and without a partner may also be living off unilateral transfers
(welfare payments, fellowships, charity, remittances) or off his/her own
savings, i.e. from a gradually diminishing capital stock.
Second, some approach to a solution. You are right in the sense
that guessing the occupation based on education entangles you in some kind
of circularity. 82% of people are characterized by education and occupation,
and 18% would be characterized by education and a function of education. But
you've got to do something.
My own proposals would be to estimate the "ghost" (not properly
"missing") occupational status as a function not (only) of education but of
a number of predictors including age, gender, area of residence, and
whatever other background variables you have at your disposal. Then proceed
with the estimated occupational status as if it were the actual occupational
status. Use multiple imputation if possible. Not very clean, but doable. You
may or may not include education among the predictors.
On the other hand I see your total score is a simple summation that
gives a weight of 5 to occupation and 3 to education. This may be right, but
it may also be totally arbitrary. Why 5 and 3? Five for jumping one
arbitrary stratum in occupational status, and 3 for jumping one year of
education or one educational level? Why not 6 and 4? Or 7 and 2? Or 1 and 1?
If your total score has these arbitrary elements already built in, why
bother with niceties about those "missing" occupational statuses? On the
face of it, I would say don't worry too much: just use education in those
cases, rescale the education score to put it within the min/max boundaries
of the total score, and be done with it.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dale Glaser
Sent: 23 July 2007 17:42
To: [hidden email]
Subject: Missingness on covariate

Hi all...I would appreciate any opinions to see if I'm slightly
wayward in mine!

I'm on a project where from a total sample size of n = 129 there
are n = 18 (13.95%) missing values for a variable that is an element for
deriving a SES-type total score. The variable that constitues the
missingness in computing the total score is 'occupation' and this variable
(which can be segmented into 9 strata) does not have a scoring option for
housewife/disabled/retired/student. The total 'social status score' is a
simple additive function of: (5 x occupation) + (3 x Education) and if both
respondent and partner are employed, you take the average of those scores
(or just use the score from the sole 'breadwinner').

That being said n = 18 are housewives/disabled/retired/students
that do not have legitimate occupational scores (per this scoring system)
and also may not have a partner from which to derive a score. At first I was
thinking that I could at least use the information from education (which we
have the full complement of data) and at least obtain a score in that
fashion, but it seems abundantly clear that scores are based on occupation
AND education, and also it is possible to have a score lower than the lowest
boundary value (x = 8).

Hence, though there are some auxillary variables that have some
substantive correlations with the SES-score, and could serve as possible
inputs for a ML/EM approach to missing data (e.g, I could use the multiple
imputation option in PRELIS that uses EM or MCMC algorithm) my feeling is
that the structure of this score demands information for both occupation and
education and doesn't give the option of occupation OR education, thus by
willy-nilly using some likelhood approach to estimate the parameters, it
seems to me our research team would be running afoul of the fundamental
foundations/definition of this construct (regardless of mechanism of
missingness...MCAR, MAR, etc.).

Any opinions would be appreciated......I hate the thought of
losing 13.95% of the data but I am having a hard time justifying a
likelihood approach for missing data given the inherent definition of the
social status score.

thank you....Dale


Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/AIU
President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com



Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/AIU
President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com
Reply | Threaded
Open this post in threaded view
|

SCRIPT QUESTION: how to change size of graph

Max Bell-2
In reply to this post by Kathryn Gardner
Dear list,

I want export output items to Word using the script command

objOutputDoc.ExportDocument (SpssVisible, "C:\temp\mydoc.doc",
SpssFormatDoc, True).

The graphs remain unchanged and are too wide for Word documents. So I
want to change the size using a script. I took an example to change
background colour:

Sub Main
        Dim objOutputDoc As ISpssOutputDoc
        Dim objOutputItems As ISpssItems
        Dim objOutputItem As ISpssItem
        Dim objSPSSIGraph As ISpssIGraph
        Dim objIgraph As ISpssIGraph

        Set objOutputDoc = objSpssApp.GetDesignatedOutputDoc

        'We get Output Items and loop through to find each IGraph

        Set objOutputItems = objOutputDoc.Items()
        Dim intItemCount As Integer
                For intItemCount = 0 To objOutputItems.Count - 1
                Set objOutputItem = objOutputItems.GetItem(intItemCount)
                                If objOutputItem.SPSSType = SPSSIGraph
Then
>>>>>>>>>>>>>>> I believe here a command like this:
objIgraph.Width=objIgraph.Width* .5
                                End If
                Next
End Sub

But it doesn't work, Does anybody know the answer?

Thanks, Max
Reply | Threaded
Open this post in threaded view
|

Re: SCRIPT QUESTION: how to change size of graph

vlad simion
Hi Max,

you hould try using the ExportChartPercent method to control the size of
your chart, look in the scripting help for more information about this.

hth,

Vlad

On 7/26/07, Max Bell <[hidden email]> wrote:

>
> Dear list,
>
> I want export output items to Word using the script command
>
> objOutputDoc.ExportDocument (SpssVisible, "C:\temp\mydoc.doc",
> SpssFormatDoc, True).
>
> The graphs remain unchanged and are too wide for Word documents. So I
> want to change the size using a script. I took an example to change
> background colour:
>
> Sub Main
>         Dim objOutputDoc As ISpssOutputDoc
>         Dim objOutputItems As ISpssItems
>         Dim objOutputItem As ISpssItem
>         Dim objSPSSIGraph As ISpssIGraph
>         Dim objIgraph As ISpssIGraph
>
>         Set objOutputDoc = objSpssApp.GetDesignatedOutputDoc
>
>         'We get Output Items and loop through to find each IGraph
>
>         Set objOutputItems = objOutputDoc.Items()
>         Dim intItemCount As Integer
>                 For intItemCount = 0 To objOutputItems.Count - 1
>                 Set objOutputItem = objOutputItems.GetItem(intItemCount)
>                                 If objOutputItem.SPSSType = SPSSIGraph
> Then
> >>>>>>>>>>>>>>> I believe here a command like this:
> objIgraph.Width=objIgraph.Width* .5
>                                 End If
>                 Next
> End Sub
>
> But it doesn't work, Does anybody know the answer?
>
> Thanks, Max
>



--
Vlad Simion
Data Analyst
Tel:      +40 720130611
Reply | Threaded
Open this post in threaded view
|

Change Fontsize of Axis value labels by SCRIPT

Max Bell-2
Hello everybody,
Is it possible to change the Fontsize of Axis value labels of an IGraph
by SCRIPT? If yes with what method?
 
Thanks, Max
Reply | Threaded
Open this post in threaded view
|

Counter in macro

Max Bell-2
In reply to this post by vlad simion
Dear List,

I'm writing a macro like this:

DEFINE !VR (!POS !CHAREND('/'))
!DO !Var !IN (!1) .

*counter definition .
????

script 'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!count;",!Var,"')").
INCLUDE 'P:\Temp\ExtractVarLabel.sps'.

!DOEND.
!ENDDEFINE .


When I call the macro for two variables (eg. !VR var1 var2 /.), the
macro runs twice and I want the counter (!count) to be 1 the first time
and 2 the second time. When I run the macro for three variables the
macro runs three times and the counter becomes 3.

What can I do?

(the script in de macro creates a syntax which is included directly
after the script.)

Thanks, Max
Reply | Threaded
Open this post in threaded view
|

Re:Counter in macro

Jerabek Jindrich
Hello Max,

Unfortunately macro is a pure text affair and You canot create anumeric counter and icrease it.

In the past, often a growing text variable was used as a counter:

DEFINE !VR (!POS !CHAREND('/'))
!LET !counter=''.
!DO !Var !IN (!1) .

 *counter definition .
!LET !counter = !Concat(!counter,'_')

script 'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!counter;",!Var,"')").
INCLUDE 'P:\Temp\ExtractVarLabel.sps'.

!DOEND.
!ENDDEFINE .


With the above syntax you will call the script with parameter '___' for the third run of !do , then compute the length of the string parameter (inside the sb script, length =3) and you will have the required count. (pls check the syntax of the parameter in script command, my suggestion is not tested)


Now there could be a better way with Python, but Im not able to use the feature.

best regards
Jindra

> ------------ Původní zpráva ------------
> Od: Max Bell <[hidden email]>
> Předmět: Counter in macro
> Datum: 30.7.2007 11:25:09
> ----------------------------------------
> Dear List,
>
> I'm writing a macro like this:
>
> DEFINE !VR (!POS !CHAREND('/'))
> !DO !Var !IN (!1) .
>
> *counter definition .
> ????
>
> script 'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!count;",!Var,"')").
> INCLUDE 'P:\Temp\ExtractVarLabel.sps'.
>
> !DOEND.
> !ENDDEFINE .
>
>
> When I call the macro for two variables (eg. !VR var1 var2 /.), the
> macro runs twice and I want the counter (!count) to be 1 the first time
> and 2 the second time. When I run the macro for three variables the
> macro runs three times and the counter becomes 3.
>
> What can I do?
>
> (the script in de macro creates a syntax which is included directly
> after the script.)
>
> Thanks, Max
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Counter in macro

Hal 9000
If you're going to end up computing a variable to determine the count,
why not just include a plain old:


compute CNT = 0.

* inside the loop structure, for each pass: *.
compute CNT = CNT + 1.


?


On 7/30/07, Jerabek Jindrich <[hidden email]> wrote:

> Hello Max,
>
> Unfortunately macro is a pure text affair and You canot create anumeric counter and icrease it.
>
> In the past, often a growing text variable was used as a counter:
>
> DEFINE !VR (!POS !CHAREND('/'))
> !LET !counter=''.
> !DO !Var !IN (!1) .
>
>  *counter definition .
> !LET !counter = !Concat(!counter,'_')
>
> script 'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!counter;",!Var,"')").
> INCLUDE 'P:\Temp\ExtractVarLabel.sps'.
>
> !DOEND.
> !ENDDEFINE .
>
>
> With the above syntax you will call the script with parameter '___' for the third run of !do , then compute the length of the string parameter (inside the sb script, length =3) and you will have the required count. (pls check the syntax of the parameter in script command, my suggestion is not tested)
>
>
> Now there could be a better way with Python, but Im not able to use the feature.
>
> best regards
> Jindra
>
> > ------------ Původní zpráva ------------
> > Od: Max Bell <[hidden email]>
> > Předmět: Counter in macro
> > Datum: 30.7.2007 11:25:09
> > ----------------------------------------
> > Dear List,
> >
> > I'm writing a macro like this:
> >
> > DEFINE !VR (!POS !CHAREND('/'))
> > !DO !Var !IN (!1) .
> >
> > *counter definition .
> > ????
> >
> > script 'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!count;",!Var,"')").
> > INCLUDE 'P:\Temp\ExtractVarLabel.sps'.
> >
> > !DOEND.
> > !ENDDEFINE .
> >
> >
> > When I call the macro for two variables (eg. !VR var1 var2 /.), the
> > macro runs twice and I want the counter (!count) to be 1 the first time
> > and 2 the second time. When I run the macro for three variables the
> > macro runs three times and the counter becomes 3.
> >
> > What can I do?
> >
> > (the script in de macro creates a syntax which is included directly
> > after the script.)
> >
> > Thanks, Max
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Counter in macro

Richard Ristow
At 04:30 PM 7/30/2007, Hal 9000 wrote:

>If you're going to end up computing a variable to determine the count,
>why not just include a plain old:
>
>compute CNT = 0.
>
>* inside the loop structure, for each pass: *.
>compute CNT = CNT + 1.

Unfortunately, that won't work here, since the counter is used in a
script invocation:

>script 'P:\scripts\ExtractVarLabels.SBS'
>!CONCAT("('!count;",!Var,"')").
Reply | Threaded
Open this post in threaded view
|

Re: Counter in macro

Richard Ristow
In reply to this post by Max Bell-2
At 03:43 AM 7/30/2007, Max Bell wrote:

>I'm writing a macro like this:
>
>DEFINE !VR (!POS !CHAREND('/'))
>!DO !Var !IN (!1) .
>
>*counter definition .
>????
>
>script
>'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!count;",!Var,"')").
>INCLUDE 'P:\Temp\ExtractVarLabel.sps'.
>
>!DOEND.
>!ENDDEFINE .

As Jerabek Jindrich wrote, SPSS's macro facility doesn't support
arithmetic of any kind. (The deficiency has long been bemoaned.)
Jerabek is right that a 'growing string' is the best substitute for a
counter, but I think his implementation has a couple of bugs. Jerabek
had,

>!LET !counter = !Concat(!counter,'_')
>
>script
>'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!counter;",!Var,"')").

That puts the string of dashes, rather than its numerical value, into
the file name. And I think you and Jarabek both have unbalanced quotes
in the arguments to !CONCAT. This is SPSS 15 draft output (WRR:not
saved separately). Macro "!LinEcho" displays a generated line without
executing it - the generated lines won't work in my environment, of
course.


DEFINE !VR2 (!POS !CHAREND('/'))
!LET !Growing=''.
!DO !Var !IN (!1) .

  *counter definition .
!LET !Growing = !Concat(!Growing,'_')
!LET !Count   = !LENGTH(!Growing)

!LinEcho script
'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('",!count,";",!Var,"'
    )").
!LinEcho INCLUDE 'P:\Temp\ExtractVarLabel.sps'.

!DOEND.
!ENDDEFINE .

!VR2 Alpha Beta Gamma /.
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('1;Alpha')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('2;Beta')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('3;Gamma')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'


=========================================
APPENDIX: All code, including test macros,
           in a trace run showing problems
=========================================
*  Macro !MacEcho is used to display a macro expansion   ........... .
      Call  : is used to display a macro expansion ...........
      Result: is used to display a macro expansion ..........
DEFINE !MacEcho(!POS !NOEXPAND !CMDEND)
    ECHO  !QUOTE(!CONCAT('     Call  : ',!1)).
    ECHO  !QUOTE(!CONCAT('     Result: ',!EVAL(!1))).
!ENDDEFINE.

*  Macro !LinEcho is used to display a generated line    ........... .
*  without executing it.                                 ........... .
DEFINE !LinEcho(!POS !NOEXPAND !CMDEND)
    ECHO  !QUOTE(!CONCAT('Generate: ',!1)).
!ENDDEFINE.

DEFINE !VR (!POS !CHAREND('/'))
!LET !counter=''.
!DO !Var !IN (!1) .

  *counter definition .
!LET !counter = !Concat(!counter,'_')

!LinECHO script
'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('!counter;",!Var,"')"
    ).
!LinECHO INCLUDE 'P:\Temp\ExtractVarLabel.sps'.

!DOEND.
!ENDDEFINE .

!VR Alpha Beta Gamma /.
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('!counter;Alpha')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('!counter;Beta')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('!counter;Gamma')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'



DEFINE !VR2 (!POS !CHAREND('/'))
!LET !Growing=''.
!DO !Var !IN (!1) .

  *counter definition .
!LET !Growing = !Concat(!Growing,'_')
!LET !Count   = !LENGTH(!Growing)

!LinEcho script
'P:\scripts\ExtractVarLabels.SBS'!CONCAT("('",!count,";",!Var,"'
    )").
!LinEcho INCLUDE 'P:\Temp\ExtractVarLabel.sps'.

!DOEND.
!ENDDEFINE .

!VR2 Alpha Beta Gamma /.
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('1;Alpha')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('2;Beta')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Generate: script 'P:\scripts\ExtractVarLabels.SBS' ('3;Gamma')
Generate: INCLUDE 'P:\Temp\ExtractVarLabel.sps'
Reply | Threaded
Open this post in threaded view
|

Advise about Igraph versus Ggraph/GPL

Max Bell-2
In reply to this post by Richard Ristow
Hello everybody, I need advise about the following:

I want to create automatically 100% Stacked Bar Charts of several
variables. When I use Igraph I need to change the layout after it's
created (I already use a Igraph Look):
- move the legend below the data region (or remove the legend)
- change scale: the tick interval
- change appearance of major and minor ticks
- add a reference line

I wonder wether this is possible with scripting?
And I discovered the syntax command Ggraph (and GPL). And I wonder
wether this command can do what I want?

Before I dive into it my question is with what method should I automate
my 100% Stacked Bar Charts?

Thanks in advance.

Max Bell