Merging data or table lookup problem

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging data or table lookup problem

Harrisonlv
Have 2x datasets A & B.

Dataset A contains the subjects ID# & answers to questions phq1 to phq240.

xxxx-xx-xxxx   1251212351....

Dataset B contains the item name, possible response categories, & weights to
be assigned each response category.

phq1      1    .3355
phq1      2    .4736
phq1      3    .2692
phq1      4    .2628
phq1      5    .3103
phq2      1    .2345
phq2      2    .3445
.
.
phq240

How do I use dataset B as a table lookup for dataset A so that I can score a
subject's answers to questions phq1 to phq240 according to their assigned
weights in dataset B?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merging data or table lookup problem

Maguin, Eugene
Harrison,

I think there are a couple of ways to do this. What follows is untested. I
think the easier way is to restructure dataset A from wide to long. I assume
dataset A looks like

Pid phq1 phq2 ... Phw240

Restructure then sort.

Varstocases /make phq from phq1 to phq240/null=keep/count=vnum.
Sort cases by vnum phq.


The result will look like

Pid           phq  vnum
1234-56-1111   1     1
1234-56-1112   1     1
1234-56-1113   1     1
...
9999-99-9999   5   240

The sort is needed to match up with dataset B


I'm assuming dataset B looks like

Varid varvalue weight
phq1      1    .3355

In dataset A the variable vnum carries the variable number, i.e.. Vnum=1 for
phq1, etc.  A similar variable needs to be created in dataset B. So in
dataset B:

Sort cases by varid varvalue.
Do if ($casenum eq 1).
+  compute vnum=1.
Else if (varid eq lag(varid)).
+  compute vnum=lag(vnum).
Else.
+  compute vnum=lag(vnum)+1.
End if.

So now dataset B looks like

Varid varvalue weight vnum
phq1      1    .3355   1
...
phq1      5    .3103   1
phq2      1    .2345   2

Now match files using the table subcommand.

Match files file='dataset A'/rename=(phq=varvalue)/
            table='dataset B'/by vnum varvalue.
Execute.

The rename is needed because I gave the same functional variable different
names in the two datasets. A rename will have to be done some where. I did
it here, it could be done elsewhere.

The resulting dataset will look like

Pid          varvalue  vnum  weight
1234-56-1111   1        1    .3355
1234-56-1122   1        1    .3355
1234-56-1145   1        1    .3355
...
9999-99-9934   5      240    .2345


Next, sort cases to get back to the correct order.

Sort cases by pid vnum.

I'm not sure what you need to do from this point on but it sounds like you
might simply add the weights for each person. Thus:

Aggregate outfile=*/break=pid/total=sum(weight).

>>>> Warning <<<<
No allowance has been made for missing data in phq1 to phq240, either user
or sysmis. If usermissing data is present, I think you will have to turn the
usermissing designation off so that the match will work and will give sysmis
to the weight variable in the final dataset. If you have sysmis values, I
think you should recode the sysmis value to a user missing value and proceed
as for usermissing. Careful data checking should be done.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Harrison
Sent: Wednesday, January 26, 2011 2:59 PM
To: [hidden email]
Subject: Merging data or table lookup problem

Have 2x datasets A & B.

Dataset A contains the subjects ID# & answers to questions phq1 to phq240.

xxxx-xx-xxxx   1251212351....

Dataset B contains the item name, possible response categories, & weights to
be assigned each response category.

phq1      1    .3355
phq1      2    .4736
phq1      3    .2692
phq1      4    .2628
phq1      5    .3103
phq2      1    .2345
phq2      2    .3445
.
.
phq240

How do I use dataset B as a table lookup for dataset A so that I can score a
subject's answers to questions phq1 to phq240 according to their assigned
weights in dataset B?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Merging data or table lookup problem

Bruce Weaver
Administrator
In reply to this post by Harrisonlv
How about something like the following?  

new file.
dataset close all.

data list list / id (a12) A1 A2 (2f1.0).
begin data
aaaa-aa-aaaa 1 2
bbbb-bb-bbbb 2 3
cccc-cc-cccc 4 5
end data.
dataset name A .

data list list / question (a6) answer(f1.0) wght (f5.4).
begin data
phq1      1    .3355
phq1      2    .4736
phq1      3    .2692
phq1      4    .2628
phq1      5    .3103
phq2      1    .2345
phq2      2    .3445
phq2      3    .2692
phq2      4    .2628
phq2      5    .3103
end data.
dataset name B.

* First, restructure dataset A from WIDE to LONG .
dataset activate A.

VARSTOCASES
  /MAKE Answer FROM A1 A2
  /INDEX=item(2)
  /KEEP=id
  /NULL=KEEP.

* Now prepare dataset B for merging.
* Specifically, recode QUESTION into a numeric variable called ITEM .
* AUTORECODE will not work properly, because phq10 < phq2, etc; so
* the ITEM numbers would be out of order.  Therefore, we must use
* another method.  The approach below assumes the questions are
* in order from phq1 to phq240, as shown in mailing list post.

DATASET ACTIVATE B.
do if $casenum EQ 1.
- compute item = 1.
else.
- compute #increment = question NE lag(question).
- compute item = lag(item) + #increment.
end if.
exe.

DATASET ACTIVATE A.
sort cases by item answer.
MATCH FILES
  FILE=* /
  TABLE='B' /
  BY item answer.
exe.

* If you want the original case order, sort again.
sort cases by id item.



Harrisonlv wrote
Have 2x datasets A & B.

Dataset A contains the subjects ID# & answers to questions phq1 to phq240.

xxxx-xx-xxxx   1251212351....

Dataset B contains the item name, possible response categories, & weights to
be assigned each response category.

phq1      1    .3355
phq1      2    .4736
phq1      3    .2692
phq1      4    .2628
phq1      5    .3103
phq2      1    .2345
phq2      2    .3445
.
.
phq240

How do I use dataset B as a table lookup for dataset A so that I can score a
subject's answers to questions phq1 to phq240 according to their assigned
weights in dataset B?

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).