complicated matching cases issues- need help to rescue my data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

complicated matching cases issues- need help to rescue my data

Jeanie Li-2
Hi
I am a ph.d. student and am getting stuck with one complicated issue
related to matching cases across different files. I collaborate with a
professor in another university to administer an online questionnaire
to her students. The questionnaire has 20 subscales. One semester our
RA made a mistake so that instead of subjects logining with one ID to
fill out the 20 subscales, they login multiple times with different
IDs to complete different portions of the questionnaire. Even though
each subject has different IDs that were automatically generated by
the computer, they are also labeled with their student ID.

Scale 1
ID student ID item 1 item 2 item 3 item 4  item 5
1   1                3       2           2           2         3
2   1                3       2           3           2         3
3    2                1       1           1           1
1
Scale 2
ID student ID item 1 item 2 item 3 item 4  item 5
2   1                  1    1           1      1         1

For example student 1 login as ID 1 to complete scale1. Then sometime
later he login again as ID 2 to redo scale 1 and finish until scale 5.
Then he login as ID 3 to finish scale 6-8. Student 2 logins with ID 3
to complete scale 1-7. Then login as ID 4 to complete scale 6-10. Then
she drops out of the study,

My goal is to identify subjects who have completed every item of the
20 scales. I need to identify duplicating cases, match cases within
each subscale file and across 20 subscale files. I also need to
identify cases with missing value.

Just trying to see if I can rescue part of my data. Wonder if anyone
could help me solve this matching cases problem. Thanks a lot.

Reply | Threaded
Open this post in threaded view
|

Re: complicated matching cases issues

Richard Ristow
At 07:25 AM 4/10/2010, Jeanie Li wrote:

I am stuck on one issue related to matching cases across different files. I collaborate with a professor in another university to administer an online questionnaire to her students. The questionnaire has 20 subscales. One semester, instead of subjects logging with one ID to fill out the 20 subscales, they logged in multiple times with different IDs to complete different portions of the questionnaire. [Their records] are also labeled with their student ID.

With the student ID in the records, this doesn't sound too bad.

It looks like you have a separate file for each scale:
Scale 1
ID student ID item 1 item 2 item 3 item 4  item 5
1   1           3      2      2      2       3
2   1           3      2      3      2       3
Scale 2
ID student ID item 1 item 2 item 3 item 4  item 5
2   1           1      1      1      1       1

First, now, before you do anything else, add a variable to every record that identifies which scale it's for:

Scale 1
Scale ID student ID item 1 item 2 item 3 item 4  item 5
  1   1       1       3      2      2      2       3
  1   2       1       3      2      3      2       3
Scale 2
Scale ID student ID item 1 item 2 item 3 item 4  item 5
  2   2       1       1      1      1      1       1

The following should work for files with data for only one scale, or for a file with all data together. As written, it assumes the response variables are numeric.

It creates one record per scale per student, with responses recorded for every item the student has ever responded to. If the student's responded more than once to any item, it keeps the response from the highest-numbered ID, which I suppose is the latest; that can be changed, if desired.

To keep a record where of where the results came from, the file also contains
* The number of records for that scale for that student
* The lowest and highest IDs for that scale for that student

This code is not tested.

*  Make sure the raw data dataset is named, so it won't be          .
*  lost if you use another dataset. Skip this step, if the          .
*  data is already in a named dataset which is the active           .
*  dataset.                                                         .

DATASET NAME      RawData WINDOW=FRONT.

*  SORT CASES is rarely necessary for AGGREGATE. Here it is,        .
*  so the LAST function will get the value from the highest-        .
*  numbered ID.                                                     .

SORT CASES BY Scale student_ID ID.

*  Create the summary, with one record per scale per student,       .
*  in a separate dataset                                            .

DATASET DECLARE   Summary WINDOW=HIDDEN.

AGGREGATE OUTFILE=Summary
   /BREAK=Scale Scale student_ID
   /NRecs 'Number of records summarized'           = NU
   /MinID 'Lowest  ID, for this student and scale' = MIN(ID)
   /MaxID 'Highest ID, for this student and scale' = MAX(ID)
   /      item_1 TO item_6  /* This only works with variable names */
   = LAST(item_1 TO item_6) /* in form 'name'+'numeric suffix'.    */.
  
  
*  Make the summary dataset active, for further processing          .

DATASET ACTIVATE  Summary WINDOW=FRONT.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD