Z-score calculation based on control group

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Z-score calculation based on control group

carolinadamotta
Hello!
I am trying to standardize my variables in SPSS, but I cannot find the
correct procedure to calculate it correctly. I have a clinical and a
non-clinical group, and my goal is to standardize the scores based only on
the scores from the control group (non-clinical sample). I can only produce
z-scores for the complete sample and by groups using the split file... that
is not my intention because the mean scores from clinical groups are
expected to differ a lot from the control group. If I use the filter option,
than the z-scores are only calculated for my control group. how can I set
the reference group so SPSS can calculate z-scores based on that subsample
for all participants in my sample? thank you.




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Jon Peck
I am not sure why you want to do this, but the code below shows an example.  It uses the employee data.sav file shipped with Statistics.  It assumes that minority is the group variable and 0 indicates the first (pre) group.  Variables are listed in order so that TO can be used.

DATASET NAME main.
DATASET DECLARE stats.
AGGREGATE
  /OUTFILE='stats'
  /BREAK=minority
  /salary_mean=MEAN(salary) 
  /salbegin_mean=MEAN(salbegin) 
  /jobtime_mean=MEAN(jobtime) 
  /salary_sd=SD(salary) 
  /salbegin_sd=SD(salbegin)
  /jobtime_sd=SD(jobtime) .

COMPUTE group=0.

SORT CASES BY group.
DATASET ACTIVATE stats.
RENAME VARIABLES  (minority=group).
SORT CASES BY group.
DATASET ACTIVATE main.
MATCH FILES /FILE=*
  /TABLE='stats'
  /BY group
  /DROP= group.

DO REPEAT vars=salary to jobtime/means = salary_mean to jobtime_mean/stds = salary_sd to jobtime_sd.
COMPUTE vars = (vars - means) / stds.
END REPEAT.



On Tue, Apr 23, 2019 at 6:13 AM carolinadamotta <[hidden email]> wrote:
Hello!
I am trying to standardize my variables in SPSS, but I cannot find the
correct procedure to calculate it correctly. I have a clinical and a
non-clinical group, and my goal is to standardize the scores based only on
the scores from the control group (non-clinical sample). I can only produce
z-scores for the complete sample and by groups using the split file... that
is not my intention because the mean scores from clinical groups are
expected to differ a lot from the control group. If I use the filter option,
than the z-scores are only calculated for my control group. how can I set
the reference group so SPSS can calculate z-scores based on that subsample
for all participants in my sample? thank you.




--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Art Kendall
In reply to this post by carolinadamotta
What will you use that kind of z-score for?

There are several ways to do this. Aggregate. Descriptive on Group1, etc.

 one very simple way is in the syntax below. If this is a one-off it should
work.

* prepare two distributions as if for t-test.
set seed 20101802.
input program.
   loop #i = 1 to 25.
      compute group =1.
      compute x = rv.normal(22,5).
      end case.
   end loop.
   loop #i = 1 to 25.
      compute group =2.
      compute x = rv.normal(27,6).
      end case.
   end loop.
   end file.
end input program.
T-TEST GROUPS=group(1 2) /variables = x.


* use the t-test results
* copy mean and SD to compute below.
compute Group1Z = (x-22.5370)/4.98600.
formats Group(f1) x (f2)Group1Z (f5.2).
list.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Rich Ulrich
In my experience with that problem, I wanted to preserve the
computation - both for documentation purposes, and for applying
it (potentially) to other sets of data.  So I used Art's solution, rather
than Jon's.

Also, in my experience, I found it much more readable and easier
for others to grasp when I used T-scores (mean of 50, SD of 10) rather
than z-scores. For that, Art's final lines would read -
* multiply z-score by 10 to change SD; add 50 to change Mean.
compute Group1Z = 50 + 10 * (x-22.5370)/4.98600.
formats Group(f1) x (f2) Group1Z (f2).


When the original score was a scale with only a few points (instead
of being more generally continuous), I might round off the result so
that it would never have, in any sort of listing or computation, a
fraction after a decimal point.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Art Kendall <[hidden email]>
Sent: Tuesday, April 23, 2019 11:23 AM
To: [hidden email]
Subject: Re: Z-score calculation based on control group
 
What will you use that kind of z-score for?

There are several ways to do this. Aggregate. Descriptive on Group1, etc.

 one very simple way is in the syntax below. If this is a one-off it should
work.

* prepare two distributions as if for t-test.
set seed 20101802.
input program.
   loop #i = 1 to 25.
      compute group =1.
      compute x = rv.normal(22,5).
      end case.
   end loop.
   loop #i = 1 to 25.
      compute group =2.
      compute x = rv.normal(27,6).
      end case.
   end loop.
   end file.
end input program.
T-TEST GROUPS=group(1 2) /variables = x.


* use the t-test results
* copy mean and SD to compute below.
compute Group1Z = (x-22.5370)/4.98600.
formats Group(f1) x (f2)Group1Z (f5.2).
list.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Jon Peck
OTOH, the possibility of error is greater with manually  generated computes, especially if there are a lot of variables.

On Tue, Apr 23, 2019 at 10:14 AM Rich Ulrich <[hidden email]> wrote:
In my experience with that problem, I wanted to preserve the
computation - both for documentation purposes, and for applying
it (potentially) to other sets of data.  So I used Art's solution, rather
than Jon's.

Also, in my experience, I found it much more readable and easier
for others to grasp when I used T-scores (mean of 50, SD of 10) rather
than z-scores. For that, Art's final lines would read -
* multiply z-score by 10 to change SD; add 50 to change Mean.
compute Group1Z = 50 + 10 * (x-22.5370)/4.98600.
formats Group(f1) x (f2) Group1Z (f2).


When the original score was a scale with only a few points (instead
of being more generally continuous), I might round off the result so
that it would never have, in any sort of listing or computation, a
fraction after a decimal point.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Art Kendall <[hidden email]>
Sent: Tuesday, April 23, 2019 11:23 AM
To: [hidden email]
Subject: Re: Z-score calculation based on control group
 
What will you use that kind of z-score for?

There are several ways to do this. Aggregate. Descriptive on Group1, etc.

 one very simple way is in the syntax below. If this is a one-off it should
work.

* prepare two distributions as if for t-test.
set seed 20101802.
input program.
   loop #i = 1 to 25.
      compute group =1.
      compute x = rv.normal(22,5).
      end case.
   end loop.
   loop #i = 1 to 25.
      compute group =2.
      compute x = rv.normal(27,6).
      end case.
   end loop.
   end file.
end input program.
T-TEST GROUPS=group(1 2) /variables = x.


* use the t-test results
* copy mean and SD to compute below.
compute Group1Z = (x-22.5370)/4.98600.
formats Group(f1) x (f2)Group1Z (f5.2).
list.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Rich Ulrich
Good point, Jon.

We cross-check the manual Computes by confirming that the
norming group DOES have desired (mean, SD)  of (0,1) or (50,10) on the
new score.

This sort of thing is part of why data clean-up and prep often
takes hours, whereas  "doing the analyses" on a totally prepared
dataset might be measured in minutes.

--
Rich Ulrich

From: Jon Peck <[hidden email]>
Sent: Tuesday, April 23, 2019 4:00 PM
To: Rich Ulrich
Cc: SPSS List
Subject: Re: [SPSSX-L] Z-score calculation based on control group
 
OTOH, the possibility of error is greater with manually  generated computes, especially if there are a lot of variables.

On Tue, Apr 23, 2019 at 10:14 AM Rich Ulrich <[hidden email]> wrote:
In my experience with that problem, I wanted to preserve the
computation - both for documentation purposes, and for applying
it (potentially) to other sets of data.  So I used Art's solution, rather
than Jon's.

Also, in my experience, I found it much more readable and easier
for others to grasp when I used T-scores (mean of 50, SD of 10) rather
than z-scores. For that, Art's final lines would read -
* multiply z-score by 10 to change SD; add 50 to change Mean.
compute Group1Z = 50 + 10 * (x-22.5370)/4.98600.
formats Group(f1) x (f2) Group1Z (f2).


When the original score was a scale with only a few points (instead
of being more generally continuous), I might round off the result so
that it would never have, in any sort of listing or computation, a
fraction after a decimal point.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Art Kendall <[hidden email]>
Sent: Tuesday, April 23, 2019 11:23 AM
To: [hidden email]
Subject: Re: Z-score calculation based on control group
 
What will you use that kind of z-score for?

There are several ways to do this. Aggregate. Descriptive on Group1, etc.

 one very simple way is in the syntax below. If this is a one-off it should
work.

* prepare two distributions as if for t-test.
set seed 20101802.
input program.
   loop #i = 1 to 25.
      compute group =1.
      compute x = rv.normal(22,5).
      end case.
   end loop.
   loop #i = 1 to 25.
      compute group =2.
      compute x = rv.normal(27,6).
      end case.
   end loop.
   end file.
end input program.
T-TEST GROUPS=group(1 2) /variables = x.


* use the t-test results
* copy mean and SD to compute below.
compute Group1Z = (x-22.5370)/4.98600.
formats Group(f1) x (f2)Group1Z (f5.2).
list.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Jon Peck
Indeed, but if one is applying a mean/sd standardization from the pre group to the post group, the latter will not have the same standardized values, making that harder to check.

Of course, one could add a PRINT command for the stat dataset to document the transformation applied, though, if there are lots of variables, a TRANSPOSE before printing would be a good idea.

On Tue, Apr 23, 2019 at 2:29 PM Rich Ulrich <[hidden email]> wrote:
Good point, Jon.

We cross-check the manual Computes by confirming that the
norming group DOES have desired (mean, SD)  of (0,1) or (50,10) on the
new score.

This sort of thing is part of why data clean-up and prep often
takes hours, whereas  "doing the analyses" on a totally prepared
dataset might be measured in minutes.

--
Rich Ulrich

From: Jon Peck <[hidden email]>
Sent: Tuesday, April 23, 2019 4:00 PM
To: Rich Ulrich
Cc: SPSS List
Subject: Re: [SPSSX-L] Z-score calculation based on control group
 
OTOH, the possibility of error is greater with manually  generated computes, especially if there are a lot of variables.

On Tue, Apr 23, 2019 at 10:14 AM Rich Ulrich <[hidden email]> wrote:
In my experience with that problem, I wanted to preserve the
computation - both for documentation purposes, and for applying
it (potentially) to other sets of data.  So I used Art's solution, rather
than Jon's.

Also, in my experience, I found it much more readable and easier
for others to grasp when I used T-scores (mean of 50, SD of 10) rather
than z-scores. For that, Art's final lines would read -
* multiply z-score by 10 to change SD; add 50 to change Mean.
compute Group1Z = 50 + 10 * (x-22.5370)/4.98600.
formats Group(f1) x (f2) Group1Z (f2).


When the original score was a scale with only a few points (instead
of being more generally continuous), I might round off the result so
that it would never have, in any sort of listing or computation, a
fraction after a decimal point.

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Art Kendall <[hidden email]>
Sent: Tuesday, April 23, 2019 11:23 AM
To: [hidden email]
Subject: Re: Z-score calculation based on control group
 
What will you use that kind of z-score for?

There are several ways to do this. Aggregate. Descriptive on Group1, etc.

 one very simple way is in the syntax below. If this is a one-off it should
work.

* prepare two distributions as if for t-test.
set seed 20101802.
input program.
   loop #i = 1 to 25.
      compute group =1.
      compute x = rv.normal(22,5).
      end case.
   end loop.
   loop #i = 1 to 25.
      compute group =2.
      compute x = rv.normal(27,6).
      end case.
   end loop.
   end file.
end input program.
T-TEST GROUPS=group(1 2) /variables = x.


* use the t-test results
* copy mean and SD to compute below.
compute Group1Z = (x-22.5370)/4.98600.
formats Group(f1) x (f2)Group1Z (f5.2).
list.







-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Bruce Weaver
Administrator
In reply to this post by Jon Peck
Rather than using MATCH FILES, I would have set a filter for the control
(non-clinical) group, then used AGGREGATE to write the needed means and SDs
(from the control group only) to the working dataset, and went from there.
E.g.,

* Change path on next line as needed.
GET FILE = "C:\SPSSdata\Employee data.sav".

COMPUTE Clinical = minority.
COMPUTE NonClin = NOT Clinical.
FORMATS Clinical NonClin (F1).
CROSSTABS Clinical BY NonClin.

* The main code begins here--change variable names as needed.
SORT CASES BY Clinical(A).
COMPUTE NonClin = Clinical EQ 0.
FORMATS NonClin (F1).
* Use AGGREGATE to write needed means and SDs from the Control Group to the
working dataset.
FILTER BY NonClin. /* Use only non-clinical data.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /salary_mean salbegin_mean jobtime_mean=MEAN(salary salbegin jobtime)
  /salary_sd salbegin_sd jobtime_sd=SD(salary salbegin jobtime) .
USE ALL.
FILTER OFF.

* Write means and SDs from the control group to records for the other group.
DO REPEAT v = salary_mean to jobtime_sd.
 IF MISSING(v) v = LAG(v).
END REPEAT.

* Now compute the z-scores.
DO REPEAT
  z = z_salary z_salbegin z_jobtime /
  v = salary to jobtime /
  m = salary_mean to jobtime_mean /
  sd = salary_sd to jobtime_sd.
COMPUTE z = (v-m) / sd.
END REPEAT.

* Verify that it worked as expected.
SPLIT FILE BY clinical.
DESCRIPTIVES salary_mean to jobtime_mean salary_sd to jobtime_sd.
DESCRIPTIVES z_salary to z_jobtime.
SPLIT FILE OFF.

* As expected, mean and SD for the z-scores are 0 and 1 respectively for the
non-clinical group.

DELETE VARIABLES salary_mean to jobtime_sd.



Jon Peck wrote

> I am not sure why you want to do this, but the code below shows an
> example.  It uses the employee data.sav file shipped with Statistics.  It
> assumes that minority is the group variable and 0 indicates the first
> (pre)
> group.  Variables are listed in order so that TO can be used.
>
> DATASET NAME main.
> DATASET DECLARE stats.
> AGGREGATE
>   /OUTFILE='stats'
>   /BREAK=minority
>   /salary_mean=MEAN(salary)
>   /salbegin_mean=MEAN(salbegin)
>   /jobtime_mean=MEAN(jobtime)
>   /salary_sd=SD(salary)
>   /salbegin_sd=SD(salbegin)
>   /jobtime_sd=SD(jobtime) .
>
> COMPUTE group=0.
>
> SORT CASES BY group.
> DATASET ACTIVATE stats.
> RENAME VARIABLES  (minority=group).
> SORT CASES BY group.
> DATASET ACTIVATE main.
> MATCH FILES /FILE=*
>   /TABLE='stats'
>   /BY group
>   /DROP= group.
>
> DO REPEAT vars=salary to jobtime/means = salary_mean to jobtime_mean/stds
> =
> salary_sd to jobtime_sd.
> COMPUTE vars = (vars - means) / stds.
> END REPEAT.
>
>
>
> On Tue, Apr 23, 2019 at 6:13 AM carolinadamotta &lt;

> carolina.d.motta@

> &gt;
> wrote:
>
>> Hello!
>> I am trying to standardize my variables in SPSS, but I cannot find the
>> correct procedure to calculate it correctly. I have a clinical and a
>> non-clinical group, and my goal is to standardize the scores based only
>> on
>> the scores from the control group (non-clinical sample). I can only
>> produce
>> z-scores for the complete sample and by groups using the split file...
>> that
>> is not my intention because the mean scores from clinical groups are
>> expected to differ a lot from the control group. If I use the filter
>> option,
>> than the z-scores are only calculated for my control group. how can I set
>> the reference group so SPSS can calculate z-scores based on that
>> subsample
>> for all participants in my sample? thank you.
>>
>>
>>
>>
>> --
>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> Jon K Peck

> jkpeck@

>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

carolinadamotta
In reply to this post by carolinadamotta
Hi!
thank you all for the replies.

I am not sure my previous message has reached the list, but below I explain a little better the purpose of calculating z-scores.

The data I'm analyzing is cognitive performance in a battery of 19 different tests that yield several scores each. I have a clinical group  (36 participants diagnosed with schizophrenia) vs. non-clinical group (~100 healthy controls). The groups do not differ regarding age and other relevant characteristics, so i'm trying to figure out the cognitive decline across several domains presented by the chronic patients (e.g. memory, attention, etc).

It is a current practice to standardize the cognitive battery's test as z-scores prior to analysis and to further calculate other types of scores such as task efficiency (based on the trade-off between accuracy and speed scores). The group of chronic patients is expected to have a significantly lower performance in several tasks in terms of accuracy and speed, so I figured that transforming scores based on the complete sample would bias the Mean and SD values used to calculate z-scores. was I thinking correctly?

My intention was to let SPSS know the non-clinical group is the "normative group" and that scores should be standardized for all participants based on the healthy controls' scores - they are coded as a dummy variable, 0 = healthy controls; 1 = chronic patient, so I can depart from there. I am not very familiar with aggregate function, but it seems like the most feasible approach. Filters didn't work. I'll try this and see how it goes.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

carolinadamotta
In reply to this post by carolinadamotta
Hello,

I've been trying to run the example syntax you sent me so I could understand how to apply it to my database. Jon's syntax at some point produced empty variables on the employee data and while Art's solution seemed more on the spot. However, I have hundreds of variables to transform. So typing each mean and sd seems a bit risky or I could simply go to the usual compute variable and build a simple syntax for each variable (I'm not used to the spss programming language so I find some things really hard to read, let alone write).

I've been experimenting with the software in a simpler database I've created using aggregate, split, all I knew... I understand that when I z-transform based on all cases I still get about the same distributions if I plot the scores, for instance. but the results can be very different: the same participant from the clinical sample can have a Z-score of 1.3 if the z-transformation is based on my whole sample (clinical and non-clinical sample, which pulls the mean score down and can increase sd) but has a z-score above 7 if I transform it based only in the control group (how far is the participant's performance from the performance of an equivalent healthy group of individuals, which is what i want to know). despite sd being basically the same, wouldn't this bias a lot further data analyses (e.g. t-tests, anovas, etc) because of such different z-scores using one method or another?

I'm still very baffled there's not a simpler way to do it in spss and I'll end up doing this "semi-manually" by writing one syntax for each variable introducing the control group values by hand...

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

David Marso
Administrator
Another approach.
Go wide to long using VARSTOCASES.
AGGREGATE to get MEAN and SD.
Drag appropriate values.
Calculate Z scores.
Restore wide structure using CASESTOVARS.
Match to original file (optional).

/* Data simulation */.
NEW FILE.
DATASET CLOSE ALL.
MATRIX.
SAVE UNIFORM(1000,100)/OUTFILE*/VARIABLES X001 TO X100.
END MATRIX.
COMPUTE GroupVar=TRUNC(RV.UNIFORM(0,2)).
FREQUENCIES GroupVar.

/* End Data simulation */.
COMPUTE UniqueID=$CASENUM.
DATASET NAME raw.
DATASET COPY rawcopy.
DATASET ACTIVATE rawcopy.
VARSTOCASES /MAKE Values FROM X001 TO X100 /INDEX=VarName(Values).
AGGREGATE OUTFILE * MODE=ADDVARIABLES
 /BREAK=GroupVar VarName
 /MeanValue=MEAN(Values)
 /SDValue=SD(Values).
SORT CASES BY VarName (A) GroupVar (A).
/* Assumes you want to use the 0 group as norm */.
/* Otherwise SORT descending (D) and DO IF (GroupVar EQ 0) */.
DO IF (VarName=LAG(VarName) AND GroupVar EQ 1).
+  COMPUTE MeanValue=LAG(MeanValue).
+  COMPUTE SDValue=LAG(SDValue).
END IF.

COMPUTE ZScoreValue=(Values-MeanValue)/SDValue.
SORT CASES BY UniqueID VarName.
DELETE VARIABLES Values GroupVar MeanValue SDValue.
CASESTOVARS ID=UniqueID /INDEX=VarName .
RENAME VARIABLES (X001 TO X100 =Z001 TO Z100).
MATCH FILES /FILE=raw /FILE */BY UniqueID.
EXECUTE.



carolinadamotta wrote

> Hello,
>
> I've been trying to run the example syntax you sent me so I could
> understand how to apply it to my database. Jon's syntax at some point
> produced empty variables on the employee data and while Art's solution
> seemed more on the spot. However, I have hundreds of variables to
> transform. So typing each mean and sd seems a bit risky or I could simply
> go to the usual compute variable and build a simple syntax for each
> variable (I'm not used to the spss programming language so I find some
> things really hard to read, let alone write).
>
> I've been experimenting with the software in a simpler database I've
> created using aggregate, split, all I knew... I understand that when I
> z-transform based on all cases I still get about the same distributions if
> I plot the scores, for instance. but the results can be very different:
> the same participant from the clinical sample can have a Z-score of 1.3 if
> the z-transformation is based on my whole sample (clinical and
> non-clinical sample, which pulls the mean score down and can increase sd)
> but has a z-score above 7 if I transform it based only in the control
> group (how far is the participant's performance from the performance of an
> equivalent healthy group of individuals, which is what i want to know).
> despite sd being basically the same, wouldn't this bias a lot further data
> analyses (e.g. t-tests, anovas, etc) because of such different z-scores
> using one method or another?
>
> I'm still very baffled there's not a simpler way to do it in spss and I'll
> end up doing this "semi-manually" by writing one syntax for each variable
> introducing the control group values by hand...
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Rich Ulrich
In reply to this post by carolinadamotta
David answered completely the "how-to" question. I will address some others
that are explicit (ANOVA) and implicit (scaling and testing choices).

No, ANOVA testing of one variable is not affected by any linear transformation/
standardization which is applied to the whole sample. If it plots the same, it tests
the same.  ("Centering" when computing an interaction gives a result that does not
plot the same.)

I've never wanted to test "hundreds" of separate variables, though I have had
that many if I counted items on scales. Composite scores or factors increase the
reliability of values being tested, and smooth out some distributional oddities.
If I were presenting hundreds of results, I think that the original, base values
would be meaningful (from their anchors) as "means" in a way that z-scores are
not.  I compute "factor scores" from items as averages, not totals, in order to
preserve the original anchors as labels for interpretation.

When I do compute z-scores of my composite scores, I prefer to present them as
"T-scores" -- with a mean of 50, SD of 10.  These are easier to read when listed,
and may not need the decimal point or decimal fractions, even when presenting group
means. It is also /obvious/ that these are not averages of the original items.

If you are dealing with scaled items, as I suspect, then a z-score of 7 (as you cite)
reflects the scarcity of positive responses in the control group rather than values
beyond the end of the scale.  But, for a score being tested, that is an "outlier"
which does reflect (as you also mention) heterogeneity of variance... which affects
good testing, especially when group sizes are not equal.  When I see something that
extreme, I /consider/  whether the scores should be transformed in order to do a
better job of representing "equal intervals".   -- If all the controls are 0 or 1, then I
suspect that zero to one is a greater "interval" than the distance between 3 and 4, if
one were to judge by test-retest on individuals, or by inter-rater consistency on the
same subjects.

--
Rich Ulrich










From: SPSSX(r) Discussion <[hidden email]> on behalf of Carolina da Motta <[hidden email]>
Sent: Tuesday, May 28, 2019 8:37 AM
To: [hidden email]
Subject: Re: Z-score calculation based on control group
 
Hello,

I've been trying to run the example syntax you sent me so I could understand how to apply it to my database. Jon's syntax at some point produced empty variables on the employee data and while Art's solution seemed more on the spot. However, I have hundreds of variables to transform. So typing each mean and sd seems a bit risky or I could simply go to the usual compute variable and build a simple syntax for each variable (I'm not used to the spss programming language so I find some things really hard to read, let alone write).

I've been experimenting with the software in a simpler database I've created using aggregate, split, all I knew... I understand that when I z-transform based on all cases I still get about the same distributions if I plot the scores, for instance. but the results can be very different: the same participant from the clinical sample can have a Z-score of 1.3 if the z-transformation is based on my whole sample (clinical and non-clinical sample, which pulls the mean score down and can increase sd) but has a z-score above 7 if I transform it based only in the control group (how far is the participant's performance from the performance of an equivalent healthy group of individuals, which is what i want to know). despite sd being basically the same, wouldn't this bias a lot further data analyses (e.g. t-tests, anovas, etc) because of such different z-scores using one method or another?

I'm still very baffled there's not a simpler way to do it in spss and I'll end up doing this "semi-manually" by writing one syntax for each variable introducing the control group values by hand...

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Z-score calculation based on control group

Kirill Orlov
In reply to this post by David Marso
The same result now through MATRIX (using some Kirill's functions: http://www.spsstools.net/en/KO-spssmacros, "MATRIX END MATRIX functions)

*Simulate data.
set seed 367325.
MATRIX.
SAVE UNIFORM(1000,100)/OUTFILE*/VARIABLES X001 TO X100.
END MATRIX.
COMPUTE GroupVar=TRUNC(RV.UNIFORM(0,2)).
FREQUENCIES GroupVar.
DATASET NAME raw.
***********.

*The job.
dataset activate raw.
matrix.
get data /vari= X001 to X100.
get grvar /var= GroupVar.
!freq(grvar%1%dummy%name2%name3). /*Need to turn categorical grvar into binary dummies
!aggr(data%dummy%MEAN%mean). /*Get means by groups
!aggr(data%dummy%VARIANCE%stdev). /*Get stdev by groups
compute stdev= sqrt(stdev).
compute ones= make(nrow(data),1,1).
compute newdata= (data-mean(1,:)(ones,:))/stdev(1,:)(ones,:).
     /*From every case, substract the mean of the first group (coded "0" in groupvar)
     /*and divide by the stdev of that group
save newdata /out= * /vari= Z001 to Z100.
end matrix.
match files /file= raw /file= * .
exec.
dataset name rawcopy.



28.05.2019 18:29, David Marso пишет:
Another approach.
Go wide to long using VARSTOCASES.
AGGREGATE to get MEAN and SD.
Drag appropriate values.
Calculate Z scores.
Restore wide structure using CASESTOVARS.
Match to original file (optional).

/* Data simulation */.
NEW FILE.
DATASET CLOSE ALL.
MATRIX.
SAVE UNIFORM(1000,100)/OUTFILE*/VARIABLES X001 TO X100.
END MATRIX.
COMPUTE GroupVar=TRUNC(RV.UNIFORM(0,2)).
FREQUENCIES GroupVar.

/* End Data simulation */.
COMPUTE UniqueID=$CASENUM.
DATASET NAME raw.
DATASET COPY rawcopy.
DATASET ACTIVATE rawcopy.
VARSTOCASES /MAKE Values FROM X001 TO X100 /INDEX=VarName(Values).
AGGREGATE OUTFILE * MODE=ADDVARIABLES
 /BREAK=GroupVar VarName 
 /MeanValue=MEAN(Values)
 /SDValue=SD(Values).
SORT CASES BY VarName (A) GroupVar (A).
/* Assumes you want to use the 0 group as norm */.
/* Otherwise SORT descending (D) and DO IF (GroupVar EQ 0) */.
DO IF (VarName=LAG(VarName) AND GroupVar EQ 1).
+  COMPUTE MeanValue=LAG(MeanValue).
+  COMPUTE SDValue=LAG(SDValue).
END IF.

COMPUTE ZScoreValue=(Values-MeanValue)/SDValue.
SORT CASES BY UniqueID VarName.
DELETE VARIABLES Values GroupVar MeanValue SDValue.
CASESTOVARS ID=UniqueID /INDEX=VarName .
RENAME VARIABLES (X001 TO X100 =Z001 TO Z100). 
MATCH FILES /FILE=raw /FILE */BY UniqueID.
EXECUTE.



carolinadamotta wrote
Hello,

I've been trying to run the example syntax you sent me so I could
understand how to apply it to my database. Jon's syntax at some point
produced empty variables on the employee data and while Art's solution
seemed more on the spot. However, I have hundreds of variables to
transform. So typing each mean and sd seems a bit risky or I could simply
go to the usual compute variable and build a simple syntax for each
variable (I'm not used to the spss programming language so I find some
things really hard to read, let alone write).

I've been experimenting with the software in a simpler database I've
created using aggregate, split, all I knew... I understand that when I
z-transform based on all cases I still get about the same distributions if
I plot the scores, for instance. but the results can be very different:
the same participant from the clinical sample can have a Z-score of 1.3 if
the z-transformation is based on my whole sample (clinical and
non-clinical sample, which pulls the mean score down and can increase sd)
but has a z-score above 7 if I transform it based only in the control
group (how far is the participant's performance from the performance of an
equivalent healthy group of individuals, which is what i want to know).
despite sd being basically the same, wouldn't this bias a lot further data
analyses (e.g. t-tests, anovas, etc) because of such different z-scores
using one method or another?

I'm still very baffled there's not a simpler way to do it in spss and I'll
end up doing this "semi-manually" by writing one syntax for each variable
introducing the control group values by hand...

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email]
 (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD