Conditional recoding?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Conditional recoding?

Smith, Larissa
I have a set of 12 multiple-choice items coded A-D and 1-4, where correct
answers are represented as alpha and incorrect as numeric (MC1 to MC12).  I
need to recode each item into four variables representing each answer
option, recoded one of two ways depending on the correct answer (ia1 to ia12
through id1 to id12).  So for the variables representing an 'A' response,
for instance, I'd need to recode such that:

For i = 1 to 12
If MC(i) = 'A' then
recode MC(i) ('A'=1)('1'=1) (else = 0) INTO ia(i)
else if MC(i) NE 'A' then
recode  MC(i) ('B'=0) ('C'=0) ('D'=0) ('1'=1) (else=sysmis)  INTO  ia(i).

This way if the answer isn't A then I'm only comparing the A response rate
to the correct response rate, as opposed to comparing the A response rate to
everything else.  I haven't been very successful at translating this logic
into SPSS syntax and would appreciate any suggestions anyone has to offer.

- Larissa Smith
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Richard Ristow
At 04:20 PM 4/16/2007, Larissa Smith wrote:

>I have a set of 12 multiple-choice items

Variables MC1 to MC12, in your data as you have it.

>Coded A-D and 1-4,

That is, these are all string variables, with eight possible values
(not counting the inevitable coding slips, etc.).

>Correct answers are represented as alpha and incorrect as numeric

It may not matter, but I don't understand this. Each question has four
possible correct responses, and four possible incorrect responses?

>I need to recode each item into four variables representing each
>answer option, recoded one of two ways depending on the correct answer
>(ia1 to ia12 through id1 to id12).

You probably want to step back here, and think (and post) what you want
to compare with what else - conceptually, not as SPSS variables.
Generating four variables from one usually doesn't work well, unless
the one really does represent several measurements mushed together in a
number or string.

>So for the variables representing an 'A' response, for instance, I'd
>need to recode such that:
>
>For i = 1 to 12
>If MC(i) = 'A' then
>recode MC(i) ('A'=1)('1'=1) (else = 0) INTO ia(i)
>else if MC(i) NE 'A' then
>recode  MC(i) ('B'=0) ('C'=0) ('D'=0) ('1'=1)
>(else=sysmis)  INTO  ia(i).
>
>I haven't been very successful at translating this logic into SPSS
>syntax and would appreciate any suggestions.

First, syntax. (See notes on the logic, following.) In SPSS, you use
LOOP instead of FOR, and DO IF instead of IF. Like this, though not
tested. (I'm reformatting for readability as well as correcting the
syntax.)

VECTOR MC = MC1 TO MC12.
VECTOR ia(12,F2).
LOOP i = 1 to 12
.  DO IF MC(i) = 'A' then
.     recode MC(i)
          ('A'=1)('1'=1) (else = 0) INTO ia(i).
.  else if MC(i) NE 'A' then
.     recode  MC(i)
          ('B'=0) ('C'=0) ('D'=0) ('1'=1)
          (else=sysmis)             INTO  ia(i).
.  END IF.
END LOOP.

Now, that is syntactically valid SPSS (modulo any mistakes I've made);
but it probably isn't what you want. At least, in the first RECODE, the
clauses
    ('1'=1)
and
    (else = 0)
will never be effective, since the RECODE is only executed if MC(i) is
'A'. The whole appears to be equivalent to,

.     recode MC(i)
          ('A'=1)
          ('B'=0) ('C'=0) ('D'=0) ('1'=1)
          (else=sysmis)             INTO  ia(i).

Do you want this, or something else? In particular, observe that
responses '2', '3', and '4' recode into system-missing. I would, at
least, recode them into numerical 2, 3 and 4, and make those values
user-missing.

>This way if the answer isn't A then I'm only comparing the A response
>rate to the correct response rate, as opposed to comparing the A
>response rate to everything else.

And here, I'm afraid, I lose your meaning, and can only ask for
clarification.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Maguin, Eugene
In reply to this post by Smith, Larissa
Larissa,

Would you provide some sample data (two or three cases of made up data would
be fine) as it would look before it goes into the transformation you want to
make and then what it would look like after it comes out. I have to confess
that I can't visualize what you are describing. Also, in the code you
provided, I noticed that you don't mention ib or ic or id. How do those fit
in?

Thanks, Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Smith, Larissa
In reply to this post by Richard Ristow
Richard,

Thanks - let me see if I can clarify a little.

I have 12 multiple-choice questions with four answer options each.  By
the time the data get to me the answer options have been coded so they
look like this:

MC1, correct answer A:   A 2 3 4
MC2, correct answer B:   1 B 3 4
MC3, correct answer C:   1 2 C 4
MC4, correct answer D:   1 2 3 D

Each question has one possible correct and three possible incorrect
answers, the exact values of which depend on what the item's correct
answer is.

What I need, specifically, is to generate - for each item - response
proportions and item-total correlations for the correct answer and for
each distractor. So in a sense each item does represent four
alternatives; I'm just stringing them out now into four explicit
variables (proportion answering A for each item, proportion answering B
for each item, etc.).

The way I need to recode them is this, assuming that I'm generating
stats for Response Option A:

MC1, correct answer A:      A 2 3 4
Recoded:                    1 0 0 0

MC2, correct answer B:      1 B 3 4
Recoded:                    1 0 . .

And so forth, where . is system- or user-missing.

This code:

RECODE
  MC1 to MC12
  ('A'=1)  ('1'=1) ('B'=0)('C'=0)('D'=0) (ELSE=SYSMIS)  INTO  ia1 TO
ia12.

Produces the second type of contrast for option A if the correct options
are B-D, but results in a constant if the correct answer is A.  What I'm
trying to avoid is doing 48 recode statements or writing code that I
have to rewrite every time I analyze a different twelve-item set.  I
suspected it was going to involve vectors but it looks from the code you
wrote out like I was formatting the commands wrong.


- Larissa Smith


-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, April 16, 2007 4:28 PM
To: Smith, Larissa; [hidden email]
Subject: Re: Conditional recoding?

At 04:20 PM 4/16/2007, Larissa Smith wrote:

>I have a set of 12 multiple-choice items

Variables MC1 to MC12, in your data as you have it.

>Coded A-D and 1-4,

That is, these are all string variables, with eight possible values
(not counting the inevitable coding slips, etc.).

>Correct answers are represented as alpha and incorrect as numeric

It may not matter, but I don't understand this. Each question has four
possible correct responses, and four possible incorrect responses?

>I need to recode each item into four variables representing each
>answer option, recoded one of two ways depending on the correct answer
>(ia1 to ia12 through id1 to id12).

You probably want to step back here, and think (and post) what you want
to compare with what else - conceptually, not as SPSS variables.
Generating four variables from one usually doesn't work well, unless
the one really does represent several measurements mushed together in a
number or string.

>So for the variables representing an 'A' response, for instance, I'd
>need to recode such that:
>
>For i = 1 to 12
>If MC(i) = 'A' then
>recode MC(i) ('A'=1)('1'=1) (else = 0) INTO ia(i)
>else if MC(i) NE 'A' then
>recode  MC(i) ('B'=0) ('C'=0) ('D'=0) ('1'=1)
>(else=sysmis)  INTO  ia(i).
>
>I haven't been very successful at translating this logic into SPSS
>syntax and would appreciate any suggestions.

First, syntax. (See notes on the logic, following.) In SPSS, you use
LOOP instead of FOR, and DO IF instead of IF. Like this, though not
tested. (I'm reformatting for readability as well as correcting the
syntax.)

VECTOR MC = MC1 TO MC12.
VECTOR ia(12,F2).
LOOP i = 1 to 12
.  DO IF MC(i) = 'A' then
.     recode MC(i)
          ('A'=1)('1'=1) (else = 0) INTO ia(i).
.  else if MC(i) NE 'A' then
.     recode  MC(i)
          ('B'=0) ('C'=0) ('D'=0) ('1'=1)
          (else=sysmis)             INTO  ia(i).
.  END IF.
END LOOP.

Now, that is syntactically valid SPSS (modulo any mistakes I've made);
but it probably isn't what you want. At least, in the first RECODE, the
clauses
    ('1'=1)
and
    (else = 0)
will never be effective, since the RECODE is only executed if MC(i) is
'A'. The whole appears to be equivalent to,

.     recode MC(i)
          ('A'=1)
          ('B'=0) ('C'=0) ('D'=0) ('1'=1)
          (else=sysmis)             INTO  ia(i).

Do you want this, or something else? In particular, observe that
responses '2', '3', and '4' recode into system-missing. I would, at
least, recode them into numerical 2, 3 and 4, and make those values
user-missing.

>This way if the answer isn't A then I'm only comparing the A response
>rate to the correct response rate, as opposed to comparing the A
>response rate to everything else.

And here, I'm afraid, I lose your meaning, and can only ask for
clarification.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Smith, Larissa
In reply to this post by Smith, Larissa
Gene,

The ib, ic, and id are response option codes; so, for instance, if an item's
correct answer is A I'll have proportion-endorsed and item-total
correlations for A (the main item stats), ib, ic, and id (representing the
three other possible answers).  Here are some sample data:

A 1 3 2 B C 2
2 1 B 4 B 2 2
A C B C 3 C A


Student 1 got items 1, 5, and 6 right.  Student 2 got items 2 and 6 right,
and so forth.

Say I'm doing the item stats for the fourth item, for which C is the correct
answer.  Above is what the data look like before recoding.  After recoding,
right now they look like this:

0...
0...
1...

Which is what I need it to be when I'm doing the main item stats.  For ib,
the item stats for answer option B, I would need it to look like:

1...
[missing]...
0...

and for id, item stats for option d:

[missing]...
0...
1...

I could do this with an extremely long series of recode statements, but I'd
like to have short code that I can easily port from one project to another.

- Larissa


On Mon, 16 Apr 2007 17:30:02 -0400, Gene Maguin <[hidden email]> wrote:

>Larissa,
>
>Would you provide some sample data (two or three cases of made up data would
>be fine) as it would look before it goes into the transformation you want to
>make and then what it would look like after it comes out. I have to confess
>that I can't visualize what you are describing. Also, in the code you
>provided, I noticed that you don't mention ib or ic or id. How do those fit
>in?
>
>Thanks, Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Richard Ristow
Sounds like Gene's got farther understanding this one, than I have
At 11:58 AM 4/17/2007, Larissa Smith wrote:

>The ib, ic, and id are response option codes; so, for instance, if an
>item's
>correct answer is A I'll have proportion-endorsed and item-total
>correlations for A (the main item stats), ib, ic, and id (representing
>the three other possible answers).

Here's where I'm stuck. These may be psychological technical terms.
What are
"proportion-endorsed correlations"
"item-total correlations"?

>   Here are some sample data:
>
>A 1 3 2 B C 2
>2 1 B 4 B 2 2
>A C B C 3 C A
>
>Student 1 got items 1, 5, and 6 right.  Student 2 got items 2 and 6
>right, and so forth. Say I'm doing the item stats for the fourth item,
>for which C is the correct answer.  Above is what the data look like
>before recoding.  After recoding, right now they look like this:
>
>0...
>0...
>1....
OK, the first column is, 'correct' or 'not correct'. Why are the other
three missing? Under what circumstances (in words, not code) should
they be other than missing? What, in short, do the four columns derived
from one response, represent?

>Which is what I need it to be when I'm doing the main item stats.  For
>ib, the item stats for answer option B, I would need it to look like:
>
>1...
>[missing]...
>0...
>
>and for id, item stats for option d:
>
>[missing]...
>0...
>1...
>
>I could do this with an extremely long series of recode statements,
>but I'd like to have short code that I can easily port from one
>project to another.

The immediate answer to this, really without understanding what you're
doing, is that a problem like this may best be solved by 'unrolling' to
what's called 'long' form - using VARSTOCASES so each response gets its
own record. It requires a case ID variable, which you should have in
any case. Like this, using your example data (WRR-not saved
separately):

CASEID MC1 MC2 MC3 MC4 MC5 MC6 MC7

   01   A   1   3   2   B   C   2
   02   2   1   B   4   B   2   2
   02   A   C   B   C   3   C   A


Number of cases read:  3    Number of cases listed:  3


VARSTOCASES
  /MAKE MC_Resp FROM MC1 MC2 MC3 MC4 MC5 MC6 MC7
  /INDEX = VarName(MC_Resp)
  /KEEP =  CASEID
  /NULL =  DROP.


Variables to Cases

Notes
|----------------------------|---------------------------|
|Output Created              |17-APR-2007 14:14:51       |
|----------------------------|---------------------------|
Generated Variables
|-------|------|
|Name   |Label |
|-------|------|
|VarName|<none>|
|-------|------|
|MC_Resp|<none>|
|-------|------|

Processing Statistics
|-------------|-|
|Variables In |8|
|-------------|-|
|Variables Out|3|
|-------------|-|


LIST.

List
|-----------------------------|---------------------------|
|Output Created               |17-APR-2007 14:14:52       |
|-----------------------------|---------------------------|
CASEID VarName MC_Resp

   01   MC1     A
   01   MC2     1
   01   MC3     3
   01   MC4     2
   01   MC5     B
   01   MC6     C
   01   MC7     2
   02   MC1     2
   02   MC2     1
   02   MC3     B
   02   MC4     4
   02   MC5     B
   02   MC6     2
   02   MC7     2
   02   MC1     A
   02   MC2     C
   02   MC3     B
   02   MC4     C
   02   MC5     3
   02   MC6     C
   02   MC7     A

Number of cases read:  21    Number of cases listed:  21

Now, you potentially have four recodes instead of 48. And if you're
doing correlations or proportions, either within or between subjects,
it's usually easier in 'long' form.

However, you do need some question-specific data; clearly, what you do
with the response to a question, depends on what the right answer to
that question is. This is where I get lost: what is the algorithm to
get the four values, based on the correct response and the actual
response?

And what summary statistics, within subjects or across subjects, do you
want? (As I wrote, you're using technical terms that are probably clear
if you know their definitions; but, I regret, I don't.)
===================
APPENDIX: Test data
===================
DATA LIST LIST
   /CASEID (F2)
    MC1 TO MC7 (7A1).
BEGIN DATA
01 A 1 3 2 B C 2
02 2 1 B 4 B 2 2
02 A C B C 3 C A
END DATA.
FORMATS CASEID (N2).
LIST.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Maguin, Eugene
Larissa and Richard,

>>Here's where I'm stuck. These may be psychological technical terms. What
are
"proportion-endorsed correlations"
"item-total correlations"?

I don't know what 'proportion endorsed correlations' are. Perhaps Larissa
can define this. 'Item-total correlations' are produced in the reliability
procedure and are the correlation between the item and the total score minus
that item's score. The basic idea is that good items on a test have a high
correlation between the item, scored as correct-incorrect and the test total
score minus that item's contribution.

See if I understand things now. Larissa, you have an odd data set in that it
combines two kinds of information. If the person got the item correct, the
value of the variable is the correct answer recorded as an alphabetic
designator of the answer. If the person got the answer wrong, the value of
the variable is the ordinal designator of the answer. If a person gets item
10 correct and the correct answer is choice b=2, 'B' is the data value.
However, if the person selected choice 1=a, the data value recoded is '1'.
Is this the story of how the raw data are coded?

But, I still don't quite get it. Here's the sample data. I've added an id to
it.

S1 A 1 3 2 B C 2
S2 2 1 B 4 B 2 2
S3 A C B C 3 C A

Show us what ia(i), ib(i), ic(i), id(i) for i=1,7 equal. Put it in the
following format. No explanation, just all the numbers. We have to see the
pattern to propose code to create the desired pattern.

S1 A 1 3 2 B C 2 ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) .. Id(7)

S2 2 1 B 4 B 2 2 ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) .. Id(7)

S3 A C B C 3 C A ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) .. Id(7)



Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Richard Ristow
At 05:38 PM 4/17/2007, Gene Maguin wrote:

>I don't know what 'proportion endorsed correlations' are. Perhaps
>Larissa
>can define this. 'Item-total correlations' are produced in the
>reliability
>procedure and are the correlation between the item and the total score
>minus
>that item's score.

Thank you, Gene.

>I still don't quite get it. Here's the sample data. I've added an id
>to
>it.
>
>S1 A 1 3 2 B C 2
>S2 2 1 B 4 B 2 2
>S3 A C B C 3 C A
>
>Show us what ia(i), ib(i), ic(i), id(i) for i=1,7 equal. Put it in the
>following format. No explanation, just all the numbers. We have to see
>the
>pattern to propose code to create the desired pattern.
>
>S1 A 1 3 2 B C 2 ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) ..
>Id(7)
>
>S2 2 1 B 4 B 2 2 ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) ..
>Id(7)
>
>S3 A C B C 3 C A ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) ..
>Id(7)

Or maybe, since that's a pretty unwieldy data line, in 'long' form:

S1 1 A <Ia> <Ib> <Ic> <Id>
S1 2 1 <Ia> <Ib> <Ic> <Id>
S1 3 3 <Ia> <Ib> <Ic> <Id>
...
S3 5 3 <Ia> <Ib> <Ic> <Id>
S3 6 C <Ia> <Ib> <Ic> <Id>
S3 7 A <Ia> <Ib> <Ic> <Id>
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Smith, Larissa
>Here's where I'm stuck. These may be psychological technical terms.
>What are
>"proportion-endorsed correlations"
>"item-total correlations"?

The item-total correlation is just the corrected item-total correlation
from SPSS' reliability procedure - like Gene said, the correlation
between the item and the total test score with item removed.  The
proportion endorsed is the proportion of students in the population who
got the item correct or who put down a particular response option as
their answer.

>OK, the first column is, 'correct' or 'not correct'. Why are the other
>three missing?

Sorry, that wasn't a dot as in missing, it was an ellipsis as in "and so
forth."  There's no missing data, I just didn't fill in the entire
matrix.

>What, in short, do the four columns derived
>from one response, represent?

For the point-biserial correlations, in order: (1) The corrected
item-total correlation between answer option A and the total with the
item removed; (2) the corrected item-total correlation between answer
option B and the total with the item removed; (3) the corrected
item-total correlation between answer option C and the total with the
item removed; and (4) the corrected item-total correlation between
answer option D and the total with the item removed.

I'll try the VARSTOCASES - it looks like it might be helpful.

>what is the algorithm to
>get the four values, based on the correct response and the actual
>response?

This is going to be very long, but it will probably be more helpful than
my trying to explain.  This is the code I'm currently using. The
variable P_VALUE is not actually a p<.05 type of p value; it's a
proportion-endorsed value for how many examinees got that item right or
selected that answer option.

RECODE
  I1 to I7
  ('A'=1)  ('1'=1) (ELSE=0)  INTO  ia1 TO ia7.
EXECUTE .
RECODE
  I1 to I7
  ('B'=1) ('2'=1) (ELSE=0)  INTO  ib1 TO ib7.
EXECUTE .
RECODE
  I1 to I7
  ('C'=1) ('3'=1) (ELSE=0)  INTO  ic1 TO ic7.
EXECUTE .
RECODE
  I1 to I7
  ('D'=1) ('4'=1) (ELSE=0)  INTO  id1 TO id7.
EXECUTE .


*********start MATRIX procedure for operational
items***************************************************************
** Change the variable name for the total score for each subject

SET MXLOOP=8000000 WIDTH=200 LENGTH=NONE.
MATRIX.
GET mir/VARIABLES=item1 TO item7.
GET mia/VARIABLES=ia1 to ia7.
GET mib/VARIABLES=ib1 to ib7.
GET mic/VARIABLES=ic1 to ic7.
GET mid/VARIABLES=id1 to id7.

GET TOTAL/VARIABLES=total.

COMPUTE ID1={1:7}.

COMPUTE ID=T(ID1).
COMPUTE ncases=NROW(mir).
COMPUTE ntot=T(CSUM(mir)).
COMPUTE ntota=T(CSUM(mia)).
COMPUTE ntotb=T(CSUM(mib)).
COMPUTE ntotc=T(CSUM(mic)).
COMPUTE ntotd=T(CSUM(mid)).
COMPUTE P_VALUE1=CSUM(mir)/ncases.
COMPUTE PA1=CSUM(mia)/ncases.
COMPUTE PB1=CSUM(mib)/ncases.
COMPUTE PC1=CSUM(mic)/ncases.
COMPUTE PD1=CSUM(mid)/ncases.
COMPUTE P_VALUE=T(P_VALUE1).
COMPUTE PA=T(PA1).
COMPUTE PB=T(PB1).
COMPUTE PC=T(PC1).
COMPUTE PD=T(PD1).
COMPUTE omit=1-PA-PB-PC-PD.
COMPUTE pbs=MAKE(7,1,0).
COMPUTE pbsa=MAKE(7,1,0).
COMPUTE pbsb=MAKE(7,1,0).
COMPUTE pbsc=MAKE(7,1,0).
COMPUTE pbsd=MAKE(7,1,0).
COMPUTE Q_VALUE=1-P_VALUE.
COMPUTE QA=1-PA.
COMPUTE QB=1-PB.
COMPUTE QC=1-PC.
COMPUTE QD=1-PD.
****COMPUTE TOTAL=RSUM(mir).
COMPUTE TOTALr=MAKE(ncases,7,0).
COMPUTE TOTALa=MAKE(ncases,7,0).
COMPUTE TOTALb=MAKE(ncases,7,0).
COMPUTE TOTALc=MAKE(ncases,7,0).
COMPUTE TOTALd=MAKE(ncases,7,0).
LOOP i=1 to 7.
LOOP j=1 to ncases.
DO IF (mir(j,i) EQ 1).
COMPUTE TOTALr(j,i)=TOTAL(j).
END IF.
DO IF (mia(j,i) EQ 1).
COMPUTE TOTALa(j,i)=TOTAL(j).
END IF.
DO IF (mib(j,i) EQ 1).
COMPUTE TOTALb(j,i)=TOTAL(j).
END IF.
DO IF (mic(j,i) EQ 1).
COMPUTE TOTALc(j,i)=TOTAL(j).
END IF.
DO IF (mid(j,i) EQ 1).
COMPUTE TOTALd(j,i)=TOTAL(j).
END IF.
END LOOP.
END LOOP.
COMPUTE U11=T(CSUM(TOTALr)).
COMPUTE U11a=T(CSUM(TOTALa)).
COMPUTE U11b=T(CSUM(TOTALb)).
COMPUTE U11c=T(CSUM(TOTALc)).
COMPUTE U11d=T(CSUM(TOTALd)).
COMPUTE Ux=CSUM(TOTAL)/ncases.
COMPUTE INT0=TOTAL-Ux.
COMPUTE dx=SQRT(CSSQ(INT0)/(ncases-1)).
COMPUTE U1=MAKE(7,1,0).
COMPUTE U1a=MAKE(7,1,0).
COMPUTE U1b=MAKE(7,1,0).
COMPUTE U1c=MAKE(7,1,0).
COMPUTE U1d=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (ntot(i) NE 0).
COMPUTE U1(i)=U11(i)/ntot(i).
ELSE.
COMPUTE U1(i)=Ux.
END IF.

DO IF (ntota(i) NE 0).
COMPUTE U1a(i)=U11a(i)/ntota(i).
ELSE.
COMPUTE U1a(i)=Ux.
END IF.

DO IF (ntotb(i) NE 0).
COMPUTE U1b(i)=U11b(i)/ntotb(i).
ELSE.
COMPUTE U1b(i)=Ux.
END IF.

DO IF (ntotc(i) NE 0).
COMPUTE U1c(i)=U11c(i)/ntotc(i).
ELSE.
COMPUTE U1c(i)=Ux.
END IF.

DO IF (ntotd(i) NE 0).
COMPUTE U1d(i)=U11d(i)/ntotd(i).
ELSE.
COMPUTE U1d(i)=Ux.
END IF.
END LOOP.
COMPUTE INT1=MAKE(7,1,0).
COMPUTE INT1a=MAKE(7,1,0).
COMPUTE INT1b=MAKE(7,1,0).
COMPUTE INT1c=MAKE(7,1,0).
COMPUTE INT1d=MAKE(7,1,0).
COMPUTE INT2=(U1-Ux)/dx.
COMPUTE INT2a=(U1a-Ux)/dx.
COMPUTE INT2b=(U1b-Ux)/dx.
COMPUTE INT2c=(U1c-Ux)/dx.
COMPUTE INT2d=(U1d-Ux)/dx.
COMPUTE INT3=MAKE(7,1,0).
COMPUTE INT3a=MAKE(7,1,0).
COMPUTE INT3b=MAKE(7,1,0).
COMPUTE INT3c=MAKE(7,1,0).
COMPUTE INT3d=MAKE(7,1,0).
COMPUTE PTBN=MAKE(7,1,0).
COMPUTE PTBNa=MAKE(7,1,0).
COMPUTE PTBNb=MAKE(7,1,0).
COMPUTE PTBNc=MAKE(7,1,0).
COMPUTE PTBNd=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (Q_VALUE(i)<>0).

COMPUTE INT1(i)=SQRT(P_VALUE(i)/Q_VALUE(i)).
COMPUTE PTBN(i)=INT1(i)*INT2(i).
ELSE.
COMPUTE PTBN(i)=0.
END IF.
DO IF (QA(i)<>0).
     COMPUTE INT1a(i)=SQRT(PA(i)/QA(i)).
     COMPUTE PTBNa(i)=INT1a(i)*INT2a(i).

ELSE.
 COMPUTE PTBNa(i)=0.
END IF.
DO IF (QB(i)<>0).
COMPUTE INT1b(i)=SQRT(PB(i)/QB(i)).
COMPUTE PTBNb(i)=INT1b(i)*INT2b(i).

ELSE.
COMPUTE PTBNb(i)=0.
END IF.
DO IF (QC(i)<>0).
COMPUTE INT1c(i)=SQRT(PC(i)/QC(i)).
COMPUTE PTBNc(i)=INT1c(i)*INT2c(i).

ELSE.
COMPUTE PTBNc(i)=0.
END IF.
DO IF (QD(i)<>0).
COMPUTE INT1d(i)=SQRT(PD(i)/QD(i)).
COMPUTE PTBNd(i)=INT1d(i)*INT2d(i).

ELSE.
COMPUTE PTBNd(i)=0.
END IF.
COMPUTE INT3(i)=P_VALUE(i)*Q_VALUE(i).
COMPUTE INT3a(i)=PA(i)*QA(i).
COMPUTE INT3b(i)=PB(i)*QB(i).
COMPUTE INT3c(i)=PC(i)*QC(i).
COMPUTE INT3d(i)=PD(i)*QD(i).
END LOOP.
COMPUTE INT4=SQRT(INT3).
COMPUTE INT4a=SQRT(INT3a).
COMPUTE INT4b=SQRT(INT3b).
COMPUTE INT4c=SQRT(INT3c).
COMPUTE INT4d=SQRT(INT3d).
LOOP i=1 to 7.
COMPUTE
pbs(i)=(PTBN(i)*dx-INT4(i))/(SQRT(INT3(i)+dx*dx-2*PTBN(i)*dx*INT4(i))).
COMPUTE
pbsa(i)=(PTBNa(i)*dx-INT4a(i))/(SQRT(INT3a(i)+dx*dx-2*PTBNa(i)*dx*INT4a(
i))).
COMPUTE
pbsb(i)=(PTBNb(i)*dx-INT4b(i))/(SQRT(INT3b(i)+dx*dx-2*PTBNb(i)*dx*INT4b(
i))).
COMPUTE
pbsc(i)=(PTBNc(i)*dx-INT4c(i))/(SQRT(INT3c(i)+dx*dx-2*PTBNc(i)*dx*INT4c(
i))).
COMPUTE
pbsd(i)=(PTBNd(i)*dx-INT4d(i))/(SQRT(INT3d(i)+dx*dx-2*PTBNd(i)*dx*INT4d(
i))).
END LOOP.
SAVE {ID,P_VALUE,
pbs,PA,PB,PC,PD,OMIT,pbsa,pbsb,pbsc,pbsd}/OUTFILE='outfile.sav'
 /VARIABLES=ID,P_VALUE,pbs,pa,pb,pc,pd,omit,pbsa,pbsb,pbsc,pbsd.
END MATRIX.

>And what summary statistics, within subjects or across subjects, do you
>want?

Across subjects, the corrected item-total correlations for each response
option - the corrected correlation between response option A and the
total, the corrected correlation between response option B and the
total, and so forth.  The snag is that instead of calcuating the
correlation with, for example, A and 1 set to 1 and everything else set
to zero as in the recode statement above, I need to calculate it with A
and 1 set to 0, the correct answer option set to 1, and the other
response options set to missing.  This is easy to do when the correct
answer is not in fact A; I'm just not sure how to code to correct for
when it is.

- Larissa
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Maguin, Eugene
In reply to this post by Smith, Larissa
Larissa,

I think I have seen some output like this before and it was from a OMR
program for scoring tests. I think the whole thing was from NCS and ran on a
little PDP computer.

Your reply was unexpected. I expected something like this

                   ia(1) .. Ia(7) Ib(1) .. Ib(7) ic(1) .. Ic(7) Id(1) ..
Id(7)
S1 A 1 3 2 B C 2   1 1 0 0 0 0 0  0 0 0 1 1 0 1  0 0 1 0 0 1 0  0 0 0 0 0 0
0
S2 2 1 B 4 B 2 2   0 1 0 0 0 0 0  1 0 1 0 1 1 1  0 0 0 0 0 0 0  0 0 0 1 0 0
0
S3 A C B C 3 C A   1 0 0 0 0 0 1  0 0 1 0 0 0 0  0 1 0 1 1 1 0  0 0 0 0 0 0
0

I probably made some mistakes in my patterning but my rule was that if for
item (i), the response was either 'A' or '1' then ia(i)=1; otherwise,
ia(i)=0.

I don't think this is correct but this is the result I need to see. The
leftmost set of columns is the input to the transformation commands and the
right most set of 4 columns is the output. I need to see the correct output
for the transformations you want to know how to do.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

striping characters from a string variable

Zdaniuk, Bozena
In reply to this post by Smith, Larissa
Hello, everybody. I have an 11 character string variable ID (e.g., the
datum is 123456abcde). I would like to strip off the characters 2-3 and
7-11 so that I have a new variable with only 1 and 4-6 characters
(1456) left. Could someone point me to or give me a syntax that I can
adapt for this task? Thanks a lot.
Bozena

Bozena Zdaniuk, Ph.D.

University of Pittsburgh

UCSUR, 6th Fl.

121 University Place

Pittsburgh, PA 15260

Ph.: 412-624-5736

Fax: 412-624-4810

email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Smith, Larissa
Sent: Wednesday, April 18, 2007 10:17 AM
To: [hidden email]
Subject: Re: Conditional recoding?

>Here's where I'm stuck. These may be psychological technical terms.
>What are
>"proportion-endorsed correlations"
>"item-total correlations"?

The item-total correlation is just the corrected item-total correlation
from SPSS' reliability procedure - like Gene said, the correlation
between the item and the total test score with item removed.  The
proportion endorsed is the proportion of students in the population who
got the item correct or who put down a particular response option as
their answer.

>OK, the first column is, 'correct' or 'not correct'. Why are the other
>three missing?

Sorry, that wasn't a dot as in missing, it was an ellipsis as in "and so
forth."  There's no missing data, I just didn't fill in the entire
matrix.

>What, in short, do the four columns derived
>from one response, represent?

For the point-biserial correlations, in order: (1) The corrected
item-total correlation between answer option A and the total with the
item removed; (2) the corrected item-total correlation between answer
option B and the total with the item removed; (3) the corrected
item-total correlation between answer option C and the total with the
item removed; and (4) the corrected item-total correlation between
answer option D and the total with the item removed.

I'll try the VARSTOCASES - it looks like it might be helpful.

>what is the algorithm to
>get the four values, based on the correct response and the actual
>response?

This is going to be very long, but it will probably be more helpful than
my trying to explain.  This is the code I'm currently using. The
variable P_VALUE is not actually a p<.05 type of p value; it's a
proportion-endorsed value for how many examinees got that item right or
selected that answer option.

RECODE
  I1 to I7
  ('A'=1)  ('1'=1) (ELSE=0)  INTO  ia1 TO ia7.
EXECUTE .
RECODE
  I1 to I7
  ('B'=1) ('2'=1) (ELSE=0)  INTO  ib1 TO ib7.
EXECUTE .
RECODE
  I1 to I7
  ('C'=1) ('3'=1) (ELSE=0)  INTO  ic1 TO ic7.
EXECUTE .
RECODE
  I1 to I7
  ('D'=1) ('4'=1) (ELSE=0)  INTO  id1 TO id7.
EXECUTE .


*********start MATRIX procedure for operational
items***************************************************************
** Change the variable name for the total score for each subject

SET MXLOOP=8000000 WIDTH=200 LENGTH=NONE.
MATRIX.
GET mir/VARIABLES=item1 TO item7.
GET mia/VARIABLES=ia1 to ia7.
GET mib/VARIABLES=ib1 to ib7.
GET mic/VARIABLES=ic1 to ic7.
GET mid/VARIABLES=id1 to id7.

GET TOTAL/VARIABLES=total.

COMPUTE ID1={1:7}.

COMPUTE ID=T(ID1).
COMPUTE ncases=NROW(mir).
COMPUTE ntot=T(CSUM(mir)).
COMPUTE ntota=T(CSUM(mia)).
COMPUTE ntotb=T(CSUM(mib)).
COMPUTE ntotc=T(CSUM(mic)).
COMPUTE ntotd=T(CSUM(mid)).
COMPUTE P_VALUE1=CSUM(mir)/ncases.
COMPUTE PA1=CSUM(mia)/ncases.
COMPUTE PB1=CSUM(mib)/ncases.
COMPUTE PC1=CSUM(mic)/ncases.
COMPUTE PD1=CSUM(mid)/ncases.
COMPUTE P_VALUE=T(P_VALUE1).
COMPUTE PA=T(PA1).
COMPUTE PB=T(PB1).
COMPUTE PC=T(PC1).
COMPUTE PD=T(PD1).
COMPUTE omit=1-PA-PB-PC-PD.
COMPUTE pbs=MAKE(7,1,0).
COMPUTE pbsa=MAKE(7,1,0).
COMPUTE pbsb=MAKE(7,1,0).
COMPUTE pbsc=MAKE(7,1,0).
COMPUTE pbsd=MAKE(7,1,0).
COMPUTE Q_VALUE=1-P_VALUE.
COMPUTE QA=1-PA.
COMPUTE QB=1-PB.
COMPUTE QC=1-PC.
COMPUTE QD=1-PD.
****COMPUTE TOTAL=RSUM(mir).
COMPUTE TOTALr=MAKE(ncases,7,0).
COMPUTE TOTALa=MAKE(ncases,7,0).
COMPUTE TOTALb=MAKE(ncases,7,0).
COMPUTE TOTALc=MAKE(ncases,7,0).
COMPUTE TOTALd=MAKE(ncases,7,0).
LOOP i=1 to 7.
LOOP j=1 to ncases.
DO IF (mir(j,i) EQ 1).
COMPUTE TOTALr(j,i)=TOTAL(j).
END IF.
DO IF (mia(j,i) EQ 1).
COMPUTE TOTALa(j,i)=TOTAL(j).
END IF.
DO IF (mib(j,i) EQ 1).
COMPUTE TOTALb(j,i)=TOTAL(j).
END IF.
DO IF (mic(j,i) EQ 1).
COMPUTE TOTALc(j,i)=TOTAL(j).
END IF.
DO IF (mid(j,i) EQ 1).
COMPUTE TOTALd(j,i)=TOTAL(j).
END IF.
END LOOP.
END LOOP.
COMPUTE U11=T(CSUM(TOTALr)).
COMPUTE U11a=T(CSUM(TOTALa)).
COMPUTE U11b=T(CSUM(TOTALb)).
COMPUTE U11c=T(CSUM(TOTALc)).
COMPUTE U11d=T(CSUM(TOTALd)).
COMPUTE Ux=CSUM(TOTAL)/ncases.
COMPUTE INT0=TOTAL-Ux.
COMPUTE dx=SQRT(CSSQ(INT0)/(ncases-1)).
COMPUTE U1=MAKE(7,1,0).
COMPUTE U1a=MAKE(7,1,0).
COMPUTE U1b=MAKE(7,1,0).
COMPUTE U1c=MAKE(7,1,0).
COMPUTE U1d=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (ntot(i) NE 0).
COMPUTE U1(i)=U11(i)/ntot(i).
ELSE.
COMPUTE U1(i)=Ux.
END IF.

DO IF (ntota(i) NE 0).
COMPUTE U1a(i)=U11a(i)/ntota(i).
ELSE.
COMPUTE U1a(i)=Ux.
END IF.

DO IF (ntotb(i) NE 0).
COMPUTE U1b(i)=U11b(i)/ntotb(i).
ELSE.
COMPUTE U1b(i)=Ux.
END IF.

DO IF (ntotc(i) NE 0).
COMPUTE U1c(i)=U11c(i)/ntotc(i).
ELSE.
COMPUTE U1c(i)=Ux.
END IF.

DO IF (ntotd(i) NE 0).
COMPUTE U1d(i)=U11d(i)/ntotd(i).
ELSE.
COMPUTE U1d(i)=Ux.
END IF.
END LOOP.
COMPUTE INT1=MAKE(7,1,0).
COMPUTE INT1a=MAKE(7,1,0).
COMPUTE INT1b=MAKE(7,1,0).
COMPUTE INT1c=MAKE(7,1,0).
COMPUTE INT1d=MAKE(7,1,0).
COMPUTE INT2=(U1-Ux)/dx.
COMPUTE INT2a=(U1a-Ux)/dx.
COMPUTE INT2b=(U1b-Ux)/dx.
COMPUTE INT2c=(U1c-Ux)/dx.
COMPUTE INT2d=(U1d-Ux)/dx.
COMPUTE INT3=MAKE(7,1,0).
COMPUTE INT3a=MAKE(7,1,0).
COMPUTE INT3b=MAKE(7,1,0).
COMPUTE INT3c=MAKE(7,1,0).
COMPUTE INT3d=MAKE(7,1,0).
COMPUTE PTBN=MAKE(7,1,0).
COMPUTE PTBNa=MAKE(7,1,0).
COMPUTE PTBNb=MAKE(7,1,0).
COMPUTE PTBNc=MAKE(7,1,0).
COMPUTE PTBNd=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (Q_VALUE(i)<>0).

COMPUTE INT1(i)=SQRT(P_VALUE(i)/Q_VALUE(i)).
COMPUTE PTBN(i)=INT1(i)*INT2(i).
ELSE.
COMPUTE PTBN(i)=0.
END IF.
DO IF (QA(i)<>0).
     COMPUTE INT1a(i)=SQRT(PA(i)/QA(i)).
     COMPUTE PTBNa(i)=INT1a(i)*INT2a(i).

ELSE.
 COMPUTE PTBNa(i)=0.
END IF.
DO IF (QB(i)<>0).
COMPUTE INT1b(i)=SQRT(PB(i)/QB(i)).
COMPUTE PTBNb(i)=INT1b(i)*INT2b(i).

ELSE.
COMPUTE PTBNb(i)=0.
END IF.
DO IF (QC(i)<>0).
COMPUTE INT1c(i)=SQRT(PC(i)/QC(i)).
COMPUTE PTBNc(i)=INT1c(i)*INT2c(i).

ELSE.
COMPUTE PTBNc(i)=0.
END IF.
DO IF (QD(i)<>0).
COMPUTE INT1d(i)=SQRT(PD(i)/QD(i)).
COMPUTE PTBNd(i)=INT1d(i)*INT2d(i).

ELSE.
COMPUTE PTBNd(i)=0.
END IF.
COMPUTE INT3(i)=P_VALUE(i)*Q_VALUE(i).
COMPUTE INT3a(i)=PA(i)*QA(i).
COMPUTE INT3b(i)=PB(i)*QB(i).
COMPUTE INT3c(i)=PC(i)*QC(i).
COMPUTE INT3d(i)=PD(i)*QD(i).
END LOOP.
COMPUTE INT4=SQRT(INT3).
COMPUTE INT4a=SQRT(INT3a).
COMPUTE INT4b=SQRT(INT3b).
COMPUTE INT4c=SQRT(INT3c).
COMPUTE INT4d=SQRT(INT3d).
LOOP i=1 to 7.
COMPUTE
pbs(i)=(PTBN(i)*dx-INT4(i))/(SQRT(INT3(i)+dx*dx-2*PTBN(i)*dx*INT4(i))).
COMPUTE
pbsa(i)=(PTBNa(i)*dx-INT4a(i))/(SQRT(INT3a(i)+dx*dx-2*PTBNa(i)*dx*INT4a(
i))).
COMPUTE
pbsb(i)=(PTBNb(i)*dx-INT4b(i))/(SQRT(INT3b(i)+dx*dx-2*PTBNb(i)*dx*INT4b(
i))).
COMPUTE
pbsc(i)=(PTBNc(i)*dx-INT4c(i))/(SQRT(INT3c(i)+dx*dx-2*PTBNc(i)*dx*INT4c(
i))).
COMPUTE
pbsd(i)=(PTBNd(i)*dx-INT4d(i))/(SQRT(INT3d(i)+dx*dx-2*PTBNd(i)*dx*INT4d(
i))).
END LOOP.
SAVE {ID,P_VALUE,
pbs,PA,PB,PC,PD,OMIT,pbsa,pbsb,pbsc,pbsd}/OUTFILE='outfile.sav'
 /VARIABLES=ID,P_VALUE,pbs,pa,pb,pc,pd,omit,pbsa,pbsb,pbsc,pbsd.
END MATRIX.

>And what summary statistics, within subjects or across subjects, do you
>want?

Across subjects, the corrected item-total correlations for each response
option - the corrected correlation between response option A and the
total, the corrected correlation between response option B and the
total, and so forth.  The snag is that instead of calcuating the
correlation with, for example, A and 1 set to 1 and everything else set
to zero as in the recode statement above, I need to calculate it with A
and 1 set to 0, the correct answer option set to 1, and the other
response options set to missing.  This is easy to do when the correct
answer is not in fact A; I'm just not sure how to code to correct for
when it is.

- Larissa
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Smith, Larissa
In reply to this post by Smith, Larissa
Gene,

Okay, let me try again.  Numbers in parentheses are item numbers. In the
interests of brevity and formatting I've only included the first two items.
 This is how I need them coded:

       data        ia(1) ib(1)  ic(1) id(1)  ia(2) ib(2) ic(2) id(2)
S1 A 1 3 2 B C 2    0     0      0     0      1     .     .      .

S2 2 1 B 4 B 2 2    .     1      .     .      1     .     .      .

S3 A C B C 3 C A    0     0      0     0      0     .     0      0


Each person's answer to question 1 is now represented by four columns.
Arranged differently:


      data         ia(1) ia(2) ia(3) ia(4) ia(5) ia(6) ia(7)
S1 A 1 3 2 B C 2    1     1     .     .     0     0     .

S2 2 1 B 4 B 2 2    .     1     0     .     0     .     .

S3 A C B C 3 C A    1     0     0     0     .     0     1

I'm looking at option A.  The correct answer to item 3 is B.  For item 3,
either the answer the examinee gave will be A (coded as 1) or it will be B
(coded as 0) or it will be neither A nor B (coded as missing).

The problem I'm having is in recoding ia(i) for items where A is the right
answer, as A cannot be both 1 and 0 and therefore the recoding results in a
constant.

- Larissa
Reply | Threaded
Open this post in threaded view
|

Re: striping characters from a string variable

Marks, Jim
In reply to this post by Zdaniuk, Bozena
STR newstring (A4).
COMPUTE newstring = CONCAT(SUBSTR(oldstring,1,1),SUBSTR(oldstring,4,6)).

STR defines the new string as 4 characters
The first SUBSTR returns the first character, the second SUBSTR returns
characters 4 5 6.
CONCAT joins them into a new string in the variable newstring.

Not tested but should work.

--jim

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Zdaniuk, Bozena
Sent: Wednesday, April 18, 2007 9:37 AM
To: [hidden email]
Subject: striping characters from a string variable

Hello, everybody. I have an 11 character string variable ID (e.g., the
datum is 123456abcde). I would like to strip off the characters 2-3 and
7-11 so that I have a new variable with only 1 and 4-6 characters
(1456) left. Could someone point me to or give me a syntax that I can
adapt for this task? Thanks a lot.
Bozena

Bozena Zdaniuk, Ph.D.

University of Pittsburgh

UCSUR, 6th Fl.

121 University Place

Pittsburgh, PA 15260

Ph.: 412-624-5736

Fax: 412-624-4810

email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Smith, Larissa
Sent: Wednesday, April 18, 2007 10:17 AM
To: [hidden email]
Subject: Re: Conditional recoding?

>Here's where I'm stuck. These may be psychological technical terms.
>What are
>"proportion-endorsed correlations"
>"item-total correlations"?

The item-total correlation is just the corrected item-total correlation
from SPSS' reliability procedure - like Gene said, the correlation
between the item and the total test score with item removed.  The
proportion endorsed is the proportion of students in the population who
got the item correct or who put down a particular response option as
their answer.

>OK, the first column is, 'correct' or 'not correct'. Why are the other
>three missing?

Sorry, that wasn't a dot as in missing, it was an ellipsis as in "and so
forth."  There's no missing data, I just didn't fill in the entire
matrix.

>What, in short, do the four columns derived from one response,
>represent?

For the point-biserial correlations, in order: (1) The corrected
item-total correlation between answer option A and the total with the
item removed; (2) the corrected item-total correlation between answer
option B and the total with the item removed; (3) the corrected
item-total correlation between answer option C and the total with the
item removed; and (4) the corrected item-total correlation between
answer option D and the total with the item removed.

I'll try the VARSTOCASES - it looks like it might be helpful.

>what is the algorithm to
>get the four values, based on the correct response and the actual
>response?

This is going to be very long, but it will probably be more helpful than
my trying to explain.  This is the code I'm currently using. The
variable P_VALUE is not actually a p<.05 type of p value; it's a
proportion-endorsed value for how many examinees got that item right or
selected that answer option.

RECODE
  I1 to I7
  ('A'=1)  ('1'=1) (ELSE=0)  INTO  ia1 TO ia7.
EXECUTE .
RECODE
  I1 to I7
  ('B'=1) ('2'=1) (ELSE=0)  INTO  ib1 TO ib7.
EXECUTE .
RECODE
  I1 to I7
  ('C'=1) ('3'=1) (ELSE=0)  INTO  ic1 TO ic7.
EXECUTE .
RECODE
  I1 to I7
  ('D'=1) ('4'=1) (ELSE=0)  INTO  id1 TO id7.
EXECUTE .


*********start MATRIX procedure for operational
items***************************************************************
** Change the variable name for the total score for each subject

SET MXLOOP=8000000 WIDTH=200 LENGTH=NONE.
MATRIX.
GET mir/VARIABLES=item1 TO item7.
GET mia/VARIABLES=ia1 to ia7.
GET mib/VARIABLES=ib1 to ib7.
GET mic/VARIABLES=ic1 to ic7.
GET mid/VARIABLES=id1 to id7.

GET TOTAL/VARIABLES=total.

COMPUTE ID1={1:7}.

COMPUTE ID=T(ID1).
COMPUTE ncases=NROW(mir).
COMPUTE ntot=T(CSUM(mir)).
COMPUTE ntota=T(CSUM(mia)).
COMPUTE ntotb=T(CSUM(mib)).
COMPUTE ntotc=T(CSUM(mic)).
COMPUTE ntotd=T(CSUM(mid)).
COMPUTE P_VALUE1=CSUM(mir)/ncases.
COMPUTE PA1=CSUM(mia)/ncases.
COMPUTE PB1=CSUM(mib)/ncases.
COMPUTE PC1=CSUM(mic)/ncases.
COMPUTE PD1=CSUM(mid)/ncases.
COMPUTE P_VALUE=T(P_VALUE1).
COMPUTE PA=T(PA1).
COMPUTE PB=T(PB1).
COMPUTE PC=T(PC1).
COMPUTE PD=T(PD1).
COMPUTE omit=1-PA-PB-PC-PD.
COMPUTE pbs=MAKE(7,1,0).
COMPUTE pbsa=MAKE(7,1,0).
COMPUTE pbsb=MAKE(7,1,0).
COMPUTE pbsc=MAKE(7,1,0).
COMPUTE pbsd=MAKE(7,1,0).
COMPUTE Q_VALUE=1-P_VALUE.
COMPUTE QA=1-PA.
COMPUTE QB=1-PB.
COMPUTE QC=1-PC.
COMPUTE QD=1-PD.
****COMPUTE TOTAL=RSUM(mir).
COMPUTE TOTALr=MAKE(ncases,7,0).
COMPUTE TOTALa=MAKE(ncases,7,0).
COMPUTE TOTALb=MAKE(ncases,7,0).
COMPUTE TOTALc=MAKE(ncases,7,0).
COMPUTE TOTALd=MAKE(ncases,7,0).
LOOP i=1 to 7.
LOOP j=1 to ncases.
DO IF (mir(j,i) EQ 1).
COMPUTE TOTALr(j,i)=TOTAL(j).
END IF.
DO IF (mia(j,i) EQ 1).
COMPUTE TOTALa(j,i)=TOTAL(j).
END IF.
DO IF (mib(j,i) EQ 1).
COMPUTE TOTALb(j,i)=TOTAL(j).
END IF.
DO IF (mic(j,i) EQ 1).
COMPUTE TOTALc(j,i)=TOTAL(j).
END IF.
DO IF (mid(j,i) EQ 1).
COMPUTE TOTALd(j,i)=TOTAL(j).
END IF.
END LOOP.
END LOOP.
COMPUTE U11=T(CSUM(TOTALr)).
COMPUTE U11a=T(CSUM(TOTALa)).
COMPUTE U11b=T(CSUM(TOTALb)).
COMPUTE U11c=T(CSUM(TOTALc)).
COMPUTE U11d=T(CSUM(TOTALd)).
COMPUTE Ux=CSUM(TOTAL)/ncases.
COMPUTE INT0=TOTAL-Ux.
COMPUTE dx=SQRT(CSSQ(INT0)/(ncases-1)).
COMPUTE U1=MAKE(7,1,0).
COMPUTE U1a=MAKE(7,1,0).
COMPUTE U1b=MAKE(7,1,0).
COMPUTE U1c=MAKE(7,1,0).
COMPUTE U1d=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (ntot(i) NE 0).
COMPUTE U1(i)=U11(i)/ntot(i).
ELSE.
COMPUTE U1(i)=Ux.
END IF.

DO IF (ntota(i) NE 0).
COMPUTE U1a(i)=U11a(i)/ntota(i).
ELSE.
COMPUTE U1a(i)=Ux.
END IF.

DO IF (ntotb(i) NE 0).
COMPUTE U1b(i)=U11b(i)/ntotb(i).
ELSE.
COMPUTE U1b(i)=Ux.
END IF.

DO IF (ntotc(i) NE 0).
COMPUTE U1c(i)=U11c(i)/ntotc(i).
ELSE.
COMPUTE U1c(i)=Ux.
END IF.

DO IF (ntotd(i) NE 0).
COMPUTE U1d(i)=U11d(i)/ntotd(i).
ELSE.
COMPUTE U1d(i)=Ux.
END IF.
END LOOP.
COMPUTE INT1=MAKE(7,1,0).
COMPUTE INT1a=MAKE(7,1,0).
COMPUTE INT1b=MAKE(7,1,0).
COMPUTE INT1c=MAKE(7,1,0).
COMPUTE INT1d=MAKE(7,1,0).
COMPUTE INT2=(U1-Ux)/dx.
COMPUTE INT2a=(U1a-Ux)/dx.
COMPUTE INT2b=(U1b-Ux)/dx.
COMPUTE INT2c=(U1c-Ux)/dx.
COMPUTE INT2d=(U1d-Ux)/dx.
COMPUTE INT3=MAKE(7,1,0).
COMPUTE INT3a=MAKE(7,1,0).
COMPUTE INT3b=MAKE(7,1,0).
COMPUTE INT3c=MAKE(7,1,0).
COMPUTE INT3d=MAKE(7,1,0).
COMPUTE PTBN=MAKE(7,1,0).
COMPUTE PTBNa=MAKE(7,1,0).
COMPUTE PTBNb=MAKE(7,1,0).
COMPUTE PTBNc=MAKE(7,1,0).
COMPUTE PTBNd=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (Q_VALUE(i)<>0).

COMPUTE INT1(i)=SQRT(P_VALUE(i)/Q_VALUE(i)).
COMPUTE PTBN(i)=INT1(i)*INT2(i).
ELSE.
COMPUTE PTBN(i)=0.
END IF.
DO IF (QA(i)<>0).
     COMPUTE INT1a(i)=SQRT(PA(i)/QA(i)).
     COMPUTE PTBNa(i)=INT1a(i)*INT2a(i).

ELSE.
 COMPUTE PTBNa(i)=0.
END IF.
DO IF (QB(i)<>0).
COMPUTE INT1b(i)=SQRT(PB(i)/QB(i)).
COMPUTE PTBNb(i)=INT1b(i)*INT2b(i).

ELSE.
COMPUTE PTBNb(i)=0.
END IF.
DO IF (QC(i)<>0).
COMPUTE INT1c(i)=SQRT(PC(i)/QC(i)).
COMPUTE PTBNc(i)=INT1c(i)*INT2c(i).

ELSE.
COMPUTE PTBNc(i)=0.
END IF.
DO IF (QD(i)<>0).
COMPUTE INT1d(i)=SQRT(PD(i)/QD(i)).
COMPUTE PTBNd(i)=INT1d(i)*INT2d(i).

ELSE.
COMPUTE PTBNd(i)=0.
END IF.
COMPUTE INT3(i)=P_VALUE(i)*Q_VALUE(i).
COMPUTE INT3a(i)=PA(i)*QA(i).
COMPUTE INT3b(i)=PB(i)*QB(i).
COMPUTE INT3c(i)=PC(i)*QC(i).
COMPUTE INT3d(i)=PD(i)*QD(i).
END LOOP.
COMPUTE INT4=SQRT(INT3).
COMPUTE INT4a=SQRT(INT3a).
COMPUTE INT4b=SQRT(INT3b).
COMPUTE INT4c=SQRT(INT3c).
COMPUTE INT4d=SQRT(INT3d).
LOOP i=1 to 7.
COMPUTE
pbs(i)=(PTBN(i)*dx-INT4(i))/(SQRT(INT3(i)+dx*dx-2*PTBN(i)*dx*INT4(i))).
COMPUTE
pbsa(i)=(PTBNa(i)*dx-INT4a(i))/(SQRT(INT3a(i)+dx*dx-2*PTBNa(i)*dx*INT4a(
i))).
COMPUTE
pbsb(i)=(PTBNb(i)*dx-INT4b(i))/(SQRT(INT3b(i)+dx*dx-2*PTBNb(i)*dx*INT4b(
i))).
COMPUTE
pbsc(i)=(PTBNc(i)*dx-INT4c(i))/(SQRT(INT3c(i)+dx*dx-2*PTBNc(i)*dx*INT4c(
i))).
COMPUTE
pbsd(i)=(PTBNd(i)*dx-INT4d(i))/(SQRT(INT3d(i)+dx*dx-2*PTBNd(i)*dx*INT4d(
i))).
END LOOP.
SAVE {ID,P_VALUE,
pbs,PA,PB,PC,PD,OMIT,pbsa,pbsb,pbsc,pbsd}/OUTFILE='outfile.sav'
 /VARIABLES=ID,P_VALUE,pbs,pa,pb,pc,pd,omit,pbsa,pbsb,pbsc,pbsd.
END MATRIX.

>And what summary statistics, within subjects or across subjects, do you

>want?

Across subjects, the corrected item-total correlations for each response
option - the corrected correlation between response option A and the
total, the corrected correlation between response option B and the
total, and so forth.  The snag is that instead of calcuating the
correlation with, for example, A and 1 set to 1 and everything else set
to zero as in the recode statement above, I need to calculate it with A
and 1 set to 0, the correct answer option set to 1, and the other
response options set to missing.  This is easy to do when the correct
answer is not in fact A; I'm just not sure how to code to correct for
when it is.

- Larissa
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Maguin, Eugene
In reply to this post by Smith, Larissa
Larissa,

This helps a great deal but there still seem to be inconsistencies.

       data        ia(1) ib(1)  ic(1) id(1)  ia(2) ib(2) ic(2) id(2)
S1 A 1 3 2 B C 2    0     0      0     0      1     .     .      .
S2 2 1 B 4 B 2 2    .     1      .     .      1     .     .      .
S3 A C B C 3 C A    0     0      0     0      0     .     0      0

Shouldn't the value of ib(2) for Subject 3 (S3) be 0 and not '.'? If so, it
would fit with the overall pattern.

So the rule here seems to be
If the data value is alpha, then ia(i), ib(i), ic(i), id(i)=0.
If the data value is 1, ia(i)=1, ib(i), ic(i), id(i)='.';
If the data value is 2, ib(i)=1, ia(i), ic(i), id(i)='.';
If the data value is 3, ic(i)=1, ia(i), ib(i), id(i)='.';
If the data value is 4, id(i)=1, ia(i), ib(i), ic(i)='.'.

The code for this would be
Vector mc=mc1 to mc12/ia ib ic id(2).
Loop #i=1 to 12.
+  do if (mc(#i) eq 'A' or mc(#i) eq 'B' or mc(#i) eq 'C' or mc(#i) eq 'D').
+     compute ia(#i)=0.
+     compute ib(#i)=0.
+     compute ic(#i)=0.
+     compute id(#i)=0.
+  else if (mc(#i) eq '1').
+     compute ia(#i)=1.
+  else if (mc(#i) eq '2').
+     compute ib(#i)=1.
+  else if (mc(#i) eq '3').
+     compute ic(#i)=1.
+  else if (mc(#i) eq '4').
+     compute id(#i)=1.
+  end if.
End loop.


Each person's answer to question 1 is now represented by four columns.
Arranged differently:

      data         ia(1) ia(2) ia(3) ia(4) ia(5) ia(6) ia(7)
S1 A 1 3 2 B C 2    1     1     .     .     0     0     .
S2 2 1 B 4 B 2 2    .     1     0     .     0     .     .
S3 A C B C 3 C A    1     0     0     0     .     0     1


Larissa, after comparing this section to the above section because I note
that the value for ia(1) for subjects 1 and 3 have changed from 0 to 1 and
you provided no explanation of the difference. Perhaps you mistyped. Perhaps
you intended to say that this is the data setup needed for another analysis.
Or, god knows, maybe you mistyped in the first data setup. If this is
mistyped, then I think the above code will work. If this is the data setup
for another analysis, then you will have to provide the coding for ib, ic,
and id. If you mistyped in the first section please correct both and resend
them.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Richard Ristow
In reply to this post by Richard Ristow
At 09:54 AM 4/18/2007, Smith, Larissa wrote.:

>>[Gene] Here's the sample data. I've added an id to it.
>>
>>S1 A 1 3 2 B C 2
>>S2 2 1 B 4 B 2 2
>>S3 A C B C 3 C A
>>
>>  Here's the output from the procedure I'm doing now, for that set of
>> sample data. [Linesdrastically shortened by WRR]:
>
>Itm Key Prop. Pt.Bis. A    B    C    D
>Blank  R-A    R-B    R-C    R-D
>1   A   0.67  0.381   0.67 0.33
>0    0    0   0.38  -0.69   0      0
>2   C   0.33  0.6804  0.67 0    0.33
>0    0  -0.85   0      0.68   0
>3   B   0.67  0       0    0.67 0.33
>0    0   0      0     -0.42   0
>4   C   0.33  0.6804  0    0.33 0.33 0.33
>0   0     -0.42   0.68  -0.69
>5   B   0.67 -0.8581  0    0.67 0.33
>0    0   0     -0.85   0.68   0
>6   C   0.67  0.381   0    0.33 0.67
>0    0   0     -0.69   0.38   0
>7   A   0.33  0.6804  0.33 0.67 0    0    0   0.68  -0.85   0      0

>>Show us what ia(i), ib(i), ic(i), id(i) for i=1,7 equal.Id(7)
>
>I can't put it in exactly that format because the correlations are at
>the item level, not at the subject level.

Actually, you probably could, or at least it's done internally. The
summary statistics (correlation is a summary) are based, at their
roots, on properties of the individual responses. That's why you wanted
to recode the responses in the first place: to derive the properties
from which the summaries could be calculated.

Here's a beginning, deriving the properties and statistics for single
items and responses. I am not clear on how the other statistics are
derived. For any that correlate responses of different items, it'll
probably be necessary to return to 'wide' form, with all responses in
the same record; but you can do that, with all the recoding done in the
'long' version. Much simpler on the recoding, which is what you started
out by asking.

This is SPSS 15 draft output:

|-----------------------------|---------------------------|
|Output Created               |18-APR-2007 13:11:18       |
|-----------------------------|---------------------------|
ID It1 It2 It3 It4 It5 It6 It7

S1 A   1   3   2   B   C   2
S2 2   1   B   4   B   2   2
S3 A   C   B   C   3   C   A

Number of cases read:  3    Number of cases listed:  3


*  Unroll to 'long', one record per response, which greatly       .
*  facilitates analysis at the item level.                        .

VARSTOCASES
  /MAKE Response FROM It1 It2 It3 It4 It5 It6 It7
  /INDEX = Item(7)
  /KEEP =  ID
  /NULL = KEEP.

Variables to Cases
|----------------------------|---------------------------|
|Output Created              |18-APR-2007 13:11:21       |
|----------------------------|---------------------------|
Generated Variables
|--------|------|
|Name    |Label |
|--------|------|
|Item    |<none>|
|Response|<none>|
|--------|------|

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |18-APR-2007 13:11:21       |
|-----------------------------|---------------------------|
ID Item Response

S1    1 A
S1    2 1
S1    3 3
S1    4 2
S1    5 B
S1    6 C
S1    7 2
S2    1 2
S2    2 1
S2    3 B
S2    4 4
S2    5 B
S2    6 2
S2    7 2
S3    1 A
S3    2 C
S3    3 B
S3    4 C
S3    5 3
S3    6 C
S3    7 A

Number of cases read:  21    Number of cases listed:  21


*  Find correct response, by item. There should be a table for    .
*  this, but here's a 'cute' lash-up:                             .
*  Correct responses are letters, incorrect are digits.           .
*  Letters sort higher than digits (in ASCII)                     .
*  So the correct response for any item is the highest response   .
*  value found for it, assuming (right!) no data errors.          .

AGGREGATE
    /OUTFILE=* MODE=ADDVARIABLES
    /BREAK  =  Item
    /KEY   'Correct reponse for this item' = MAX(Response).

.  /**/ LIST  /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |18-APR-2007 13:11:21       |
|-----------------------------|---------------------------|
ID Item Response KEY

S1    1 A        A
S1    2 1        C
S1    3 3        B
S1    4 2        C
S1    5 B        B
S1    6 C        C
S1    7 2        A
S2    1 2        A
S2    2 1        C
S2    3 B        B
S2    4 4        C
S2    5 B        B
S2    6 2        C
S2    7 2        A
S3    1 A        A
S3    2 C        C
S3    3 B        B
S3    4 C        C
S3    5 3        B
S3    6 C        C
S3    7 A        A

Number of cases read:  21    Number of cases listed:  21


*  The item statistics appear to be based on two properties of    .
*  each response: What it was; and whether it was right.          .

NUMERIC   Right  (F2)
          /Select (F2).
VAR LABEL Right  'Response correct?'
          /Select 'Selection'.

VAL LABEL Right  0 'Wrong'  1 'Correct'
          /Select 0 'Blank'
                  1 'A'      2 'B'
                  3 'C'      4 'D'
                  8 'Key err'
                  9 'Unk code'.

COMPUTE Right    = (Response EQ Key).
RECODE  Response
         (' '     = 0)
         ('1','A' = 1)
         ('2','B' = 2)
         ('3','C' = 3)
         ('4','D' = 4)
         (ELSE    = 9) INTO Select.
IF    Not Right
   AND ANY(Response,'A','B','C','D','E')
                            Select = 8.

.  /**/ LIST  /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |18-APR-2007 13:21:43       |
|-----------------------------|---------------------------|
ID Item Response KEY Right Select

S1    1 A        A      1     1
S1    2 1        C      0     1
S1    3 3        B      0     3
S1    4 2        C      0     2
S1    5 B        B      1     2
S1    6 C        C      1     3
S1    7 2        A      0     2
S2    1 2        A      0     2
S2    2 1        C      0     1
S2    3 B        B      1     2
S2    4 4        C      0     4
S2    5 B        B      1     2
S2    6 2        C      0     2
S2    7 2        A      0     2
S3    1 A        A      1     1
S3    2 C        C      1     3
S3    3 B        B      1     2
S3    4 C        C      1     3
S3    5 3        B      0     3
S3    6 C        C      1     3
S3    7 A        A      1     1

Number of cases read:  21    Number of cases listed:  21


*  Item-level statistics, via AGGREGATE, LIST:                    .

AGGREGATE OUTFILE=*
    /BREAK = Item
    /NResp 'Number responding to item'     = NU
    /KEY   'Correct reponse for this item' = MAX(KEY)
    /Prop  'Proportion of correct answers' = FIN(Right, 0.9,1.1)
    /A     'Proportion of answers 1,A'     = FIN(SELECT,0.9,1.1)
    /B     'Proportion of answers 2,B'     = FIN(SELECT,1.9,2.1)
    /C     'Proportion of answers 3,C'     = FIN(SELECT,2.9,3.1)
    /D     'Proportion of answers 4,D'     = FIN(SELECT,3.9,4.1)
    /Blank 'Proportion of blank answers'   = FIN(SELECT,-1 ,0.1)
    /Anom  'Proportion anomalous answrs'   = FIN(SELECT,7.9,9.1).
FORMATS   Prop TO Anom (F5.3).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |18-APR-2007 13:21:44       |
|-----------------------------|---------------------------|
Item   NResp KEY  Prop     A     B     C     D Blank  Anom

    1       3 A    .667  .667  .333  .000  .000  .000  .000
    2       3 C    .333  .667  .000  .333  .000  .000  .000
    3       3 B    .667  .000  .667  .333  .000  .000  .000
    4       3 C    .333  .000  .333  .333  .333  .000  .000
    5       3 B    .667  .000  .667  .333  .000  .000  .000
    6       3 C    .667  .000  .333  .667  .000  .000  .000
    7       3 A    .333  .333  .667  .000  .000  .000  .000

Number of cases read:  7    Number of cases listed:  7

===================
APPENDIX: Test data
===================
* ......................................................          .
* ............   Test data               ...............          .
DATA LIST LIST SKIP=2
   /ID (A2) It1 TO It7 (7A1).
BEGIN DATA
    It 1 2 3 4 5 6 7
    ----------------
    S1 A 1 3 2 B C 2
    S2 2 1 B 4 B 2 2
    S3 A C B C 3 C A
END DATA.
* ............   Post after this point   ...............          .
* ......................................................          .
LIST.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Smith, Larissa
Richard,

Thank you!  I'll see what I can do with that.

- Larissa


-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Wednesday, April 18, 2007 12:36 PM
To: Smith, Larissa; [hidden email]
Cc: Gene Maguin
Subject: RE: Conditional recoding?

At 09:54 AM 4/18/2007, Smith, Larissa wrote.:
Reply | Threaded
Open this post in threaded view
|

Re: striping characters from a string variable

Melissa Ives
In reply to this post by Zdaniuk, Bozena
string newID (A4).
compute newID=concat(substr(ID,1,1),substr(ID,4,3)).

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Zdaniuk, Bozena
Sent: Wednesday, April 18, 2007 9:37 AM
To: [hidden email]
Subject: [SPSSX-L] striping characters from a string variable

Hello, everybody. I have an 11 character string variable ID (e.g., the
datum is 123456abcde). I would like to strip off the characters 2-3 and
7-11 so that I have a new variable with only 1 and 4-6 characters
(1456) left. Could someone point me to or give me a syntax that I can
adapt for this task? Thanks a lot.
Bozena

Bozena Zdaniuk, Ph.D.

University of Pittsburgh

UCSUR, 6th Fl.

121 University Place

Pittsburgh, PA 15260

Ph.: 412-624-5736

Fax: 412-624-4810

email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Smith, Larissa
Sent: Wednesday, April 18, 2007 10:17 AM
To: [hidden email]
Subject: Re: Conditional recoding?

>Here's where I'm stuck. These may be psychological technical terms.
>What are
>"proportion-endorsed correlations"
>"item-total correlations"?

The item-total correlation is just the corrected item-total correlation
from SPSS' reliability procedure - like Gene said, the correlation
between the item and the total test score with item removed.  The
proportion endorsed is the proportion of students in the population who
got the item correct or who put down a particular response option as
their answer.

>OK, the first column is, 'correct' or 'not correct'. Why are the other
>three missing?

Sorry, that wasn't a dot as in missing, it was an ellipsis as in "and so
forth."  There's no missing data, I just didn't fill in the entire
matrix.

>What, in short, do the four columns derived from one response,
>represent?

For the point-biserial correlations, in order: (1) The corrected
item-total correlation between answer option A and the total with the
item removed; (2) the corrected item-total correlation between answer
option B and the total with the item removed; (3) the corrected
item-total correlation between answer option C and the total with the
item removed; and (4) the corrected item-total correlation between
answer option D and the total with the item removed.

I'll try the VARSTOCASES - it looks like it might be helpful.

>what is the algorithm to
>get the four values, based on the correct response and the actual
>response?

This is going to be very long, but it will probably be more helpful than
my trying to explain.  This is the code I'm currently using. The
variable P_VALUE is not actually a p<.05 type of p value; it's a
proportion-endorsed value for how many examinees got that item right or
selected that answer option.

RECODE
  I1 to I7
  ('A'=1)  ('1'=1) (ELSE=0)  INTO  ia1 TO ia7.
EXECUTE .
RECODE
  I1 to I7
  ('B'=1) ('2'=1) (ELSE=0)  INTO  ib1 TO ib7.
EXECUTE .
RECODE
  I1 to I7
  ('C'=1) ('3'=1) (ELSE=0)  INTO  ic1 TO ic7.
EXECUTE .
RECODE
  I1 to I7
  ('D'=1) ('4'=1) (ELSE=0)  INTO  id1 TO id7.
EXECUTE .


*********start MATRIX procedure for operational
items***************************************************************
** Change the variable name for the total score for each subject

SET MXLOOP=8000000 WIDTH=200 LENGTH=NONE.
MATRIX.
GET mir/VARIABLES=item1 TO item7.
GET mia/VARIABLES=ia1 to ia7.
GET mib/VARIABLES=ib1 to ib7.
GET mic/VARIABLES=ic1 to ic7.
GET mid/VARIABLES=id1 to id7.

GET TOTAL/VARIABLES=total.

COMPUTE ID1={1:7}.

COMPUTE ID=T(ID1).
COMPUTE ncases=NROW(mir).
COMPUTE ntot=T(CSUM(mir)).
COMPUTE ntota=T(CSUM(mia)).
COMPUTE ntotb=T(CSUM(mib)).
COMPUTE ntotc=T(CSUM(mic)).
COMPUTE ntotd=T(CSUM(mid)).
COMPUTE P_VALUE1=CSUM(mir)/ncases.
COMPUTE PA1=CSUM(mia)/ncases.
COMPUTE PB1=CSUM(mib)/ncases.
COMPUTE PC1=CSUM(mic)/ncases.
COMPUTE PD1=CSUM(mid)/ncases.
COMPUTE P_VALUE=T(P_VALUE1).
COMPUTE PA=T(PA1).
COMPUTE PB=T(PB1).
COMPUTE PC=T(PC1).
COMPUTE PD=T(PD1).
COMPUTE omit=1-PA-PB-PC-PD.
COMPUTE pbs=MAKE(7,1,0).
COMPUTE pbsa=MAKE(7,1,0).
COMPUTE pbsb=MAKE(7,1,0).
COMPUTE pbsc=MAKE(7,1,0).
COMPUTE pbsd=MAKE(7,1,0).
COMPUTE Q_VALUE=1-P_VALUE.
COMPUTE QA=1-PA.
COMPUTE QB=1-PB.
COMPUTE QC=1-PC.
COMPUTE QD=1-PD.
****COMPUTE TOTAL=RSUM(mir).
COMPUTE TOTALr=MAKE(ncases,7,0).
COMPUTE TOTALa=MAKE(ncases,7,0).
COMPUTE TOTALb=MAKE(ncases,7,0).
COMPUTE TOTALc=MAKE(ncases,7,0).
COMPUTE TOTALd=MAKE(ncases,7,0).
LOOP i=1 to 7.
LOOP j=1 to ncases.
DO IF (mir(j,i) EQ 1).
COMPUTE TOTALr(j,i)=TOTAL(j).
END IF.
DO IF (mia(j,i) EQ 1).
COMPUTE TOTALa(j,i)=TOTAL(j).
END IF.
DO IF (mib(j,i) EQ 1).
COMPUTE TOTALb(j,i)=TOTAL(j).
END IF.
DO IF (mic(j,i) EQ 1).
COMPUTE TOTALc(j,i)=TOTAL(j).
END IF.
DO IF (mid(j,i) EQ 1).
COMPUTE TOTALd(j,i)=TOTAL(j).
END IF.
END LOOP.
END LOOP.
COMPUTE U11=T(CSUM(TOTALr)).
COMPUTE U11a=T(CSUM(TOTALa)).
COMPUTE U11b=T(CSUM(TOTALb)).
COMPUTE U11c=T(CSUM(TOTALc)).
COMPUTE U11d=T(CSUM(TOTALd)).
COMPUTE Ux=CSUM(TOTAL)/ncases.
COMPUTE INT0=TOTAL-Ux.
COMPUTE dx=SQRT(CSSQ(INT0)/(ncases-1)).
COMPUTE U1=MAKE(7,1,0).
COMPUTE U1a=MAKE(7,1,0).
COMPUTE U1b=MAKE(7,1,0).
COMPUTE U1c=MAKE(7,1,0).
COMPUTE U1d=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (ntot(i) NE 0).
COMPUTE U1(i)=U11(i)/ntot(i).
ELSE.
COMPUTE U1(i)=Ux.
END IF.

DO IF (ntota(i) NE 0).
COMPUTE U1a(i)=U11a(i)/ntota(i).
ELSE.
COMPUTE U1a(i)=Ux.
END IF.

DO IF (ntotb(i) NE 0).
COMPUTE U1b(i)=U11b(i)/ntotb(i).
ELSE.
COMPUTE U1b(i)=Ux.
END IF.

DO IF (ntotc(i) NE 0).
COMPUTE U1c(i)=U11c(i)/ntotc(i).
ELSE.
COMPUTE U1c(i)=Ux.
END IF.

DO IF (ntotd(i) NE 0).
COMPUTE U1d(i)=U11d(i)/ntotd(i).
ELSE.
COMPUTE U1d(i)=Ux.
END IF.
END LOOP.
COMPUTE INT1=MAKE(7,1,0).
COMPUTE INT1a=MAKE(7,1,0).
COMPUTE INT1b=MAKE(7,1,0).
COMPUTE INT1c=MAKE(7,1,0).
COMPUTE INT1d=MAKE(7,1,0).
COMPUTE INT2=(U1-Ux)/dx.
COMPUTE INT2a=(U1a-Ux)/dx.
COMPUTE INT2b=(U1b-Ux)/dx.
COMPUTE INT2c=(U1c-Ux)/dx.
COMPUTE INT2d=(U1d-Ux)/dx.
COMPUTE INT3=MAKE(7,1,0).
COMPUTE INT3a=MAKE(7,1,0).
COMPUTE INT3b=MAKE(7,1,0).
COMPUTE INT3c=MAKE(7,1,0).
COMPUTE INT3d=MAKE(7,1,0).
COMPUTE PTBN=MAKE(7,1,0).
COMPUTE PTBNa=MAKE(7,1,0).
COMPUTE PTBNb=MAKE(7,1,0).
COMPUTE PTBNc=MAKE(7,1,0).
COMPUTE PTBNd=MAKE(7,1,0).
LOOP i=1 to 7.
DO IF (Q_VALUE(i)<>0).

COMPUTE INT1(i)=SQRT(P_VALUE(i)/Q_VALUE(i)).
COMPUTE PTBN(i)=INT1(i)*INT2(i).
ELSE.
COMPUTE PTBN(i)=0.
END IF.
DO IF (QA(i)<>0).
     COMPUTE INT1a(i)=SQRT(PA(i)/QA(i)).
     COMPUTE PTBNa(i)=INT1a(i)*INT2a(i).

ELSE.
 COMPUTE PTBNa(i)=0.
END IF.
DO IF (QB(i)<>0).
COMPUTE INT1b(i)=SQRT(PB(i)/QB(i)).
COMPUTE PTBNb(i)=INT1b(i)*INT2b(i).

ELSE.
COMPUTE PTBNb(i)=0.
END IF.
DO IF (QC(i)<>0).
COMPUTE INT1c(i)=SQRT(PC(i)/QC(i)).
COMPUTE PTBNc(i)=INT1c(i)*INT2c(i).

ELSE.
COMPUTE PTBNc(i)=0.
END IF.
DO IF (QD(i)<>0).
COMPUTE INT1d(i)=SQRT(PD(i)/QD(i)).
COMPUTE PTBNd(i)=INT1d(i)*INT2d(i).

ELSE.
COMPUTE PTBNd(i)=0.
END IF.
COMPUTE INT3(i)=P_VALUE(i)*Q_VALUE(i).
COMPUTE INT3a(i)=PA(i)*QA(i).
COMPUTE INT3b(i)=PB(i)*QB(i).
COMPUTE INT3c(i)=PC(i)*QC(i).
COMPUTE INT3d(i)=PD(i)*QD(i).
END LOOP.
COMPUTE INT4=SQRT(INT3).
COMPUTE INT4a=SQRT(INT3a).
COMPUTE INT4b=SQRT(INT3b).
COMPUTE INT4c=SQRT(INT3c).
COMPUTE INT4d=SQRT(INT3d).
LOOP i=1 to 7.
COMPUTE
pbs(i)=(PTBN(i)*dx-INT4(i))/(SQRT(INT3(i)+dx*dx-2*PTBN(i)*dx*INT4(i))).
COMPUTE
pbsa(i)=(PTBNa(i)*dx-INT4a(i))/(SQRT(INT3a(i)+dx*dx-2*PTBNa(i)*dx*INT4a(
i))).
COMPUTE
pbsb(i)=(PTBNb(i)*dx-INT4b(i))/(SQRT(INT3b(i)+dx*dx-2*PTBNb(i)*dx*INT4b(
i))).
COMPUTE
pbsc(i)=(PTBNc(i)*dx-INT4c(i))/(SQRT(INT3c(i)+dx*dx-2*PTBNc(i)*dx*INT4c(
i))).
COMPUTE
pbsd(i)=(PTBNd(i)*dx-INT4d(i))/(SQRT(INT3d(i)+dx*dx-2*PTBNd(i)*dx*INT4d(
i))).
END LOOP.
SAVE {ID,P_VALUE,
pbs,PA,PB,PC,PD,OMIT,pbsa,pbsb,pbsc,pbsd}/OUTFILE='outfile.sav'
 /VARIABLES=ID,P_VALUE,pbs,pa,pb,pc,pd,omit,pbsa,pbsb,pbsc,pbsd.
END MATRIX.

>And what summary statistics, within subjects or across subjects, do you

>want?

Across subjects, the corrected item-total correlations for each response
option - the corrected correlation between response option A and the
total, the corrected correlation between response option B and the
total, and so forth.  The snag is that instead of calcuating the
correlation with, for example, A and 1 set to 1 and everything else set
to zero as in the recode statement above, I need to calculate it with A
and 1 set to 0, the correct answer option set to 1, and the other
response options set to missing.  This is easy to do when the correct
answer is not in fact A; I'm just not sure how to code to correct for
when it is.

- Larissa


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.
Reply | Threaded
Open this post in threaded view
|

Re: Conditional recoding?

Richard Ristow
In reply to this post by Smith, Larissa
I'm not working on this right now; just thinking, organizing toward
maybe getting nearer a solution later. This is responding to a couple
of posts back, where you said more about the summary statistics you
want.

At 10:16 AM 4/18/2007, Smith, Larissa wrote:

>The item-total correlation is just the corrected item-total
>correlation from SPSS' reliability procedure - like Gene said, the
>correlation between the item and the total test score with item
>removed.  The proportion endorsed is the proportion of students in the
>population who got the item correct or who put down a particular
>response option as their answer.

Let's see how nearly I have this.

For any single response to one item, there are four mutually exclusive
events, namely: response was 1, 2, 3 or 4. (Or a fifth as well -
response was blank?)

There is another event, not mutually exclusive with these: That the
response was correct. For each item, this event is identical with one
of the above 4; but which of them, varies from item to item. (That is,
which response is correct, varies from item to item.)

So from a responder's responses to N items, there are 5*N events: That
for item k, respondent answered 1, 2, 3, or 4; and, that for item k,
respondent gave the correct answer.

(I think that when you specified what you wanted, you gave only the
first four: ia, ib, ic and id. For the moment, at least, I'll consider
the last separately, and call it ir - the event that the response was
correct. Whether or not ir should be represented separately in the
data, I find it conceptually useful; see below.)

For each item k, you want
. The "item-total correlation", one number: The correlation of ir(k)
with SUM(i~=k)ir(i).
. The "item-total correlations", four numbers: The correlations of
ia(k)-id(k) with SUM(i~=k)ir(i). (Do I have that right? With the sums
of the *correct* responses on the other items? or is it ia(k) with
SUM(i~=k)ia(i)?)

The above ignores items with blank responses. You don't specify, and
maybe you don't have any of those. I could imagine, in such cases,
a) Scoring all of the ia-id, and ir, as 0
b) Scoring all of the ia-id, and ir, as missing
c) Scoring all of the ia-id as missing, and ir as 0 (i.e., we don't
know what they'd have responded, but anyhow, they didn't give the right
answer).

My first inclination would be to choose b), as methodologically
conservative; but that's really a subject-specialist's decision.

Good luck to you, but really to us,
Richard