|
Dear Listers:
I'm sure this is a truly simple problem, but alas, I am running out of time... I have a dataset in which there are multiple rows for some of the cases and one row for the others. For the cases with multiple rows, there are only actual data differences for just three variables: Score, percent, and testtype. The variables are defined as follows: 1) Score = science achievement score in one of four content/grade level tests 2) Percent = associate with the test score 3) Testtype = identifies whether the testtype is 1=physical sci/fourth grade; 2=physical science/fifth grade; 3=earth science/fourth grade; 4=earth science/fifth grade. If a student completed two tests (which they legitimately could), then they have two rows. About half of the cases took one test (so just one row). I found several cases who completed three tests (three rows for those cases). Here is an example of the data (there are also demographic variables that come after ID1 and the values of these across the rows for a case are the same): Dist School V3 V4 V5 TESTTYPE V7 Score Percent Lastname Firstname ID1 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 So, the first four lines represent two cases with two rows per case and the last is a person who just took one of the four tests. All identifying information is fictitious. What I need is to have one row per case. It seems to me that I would need to have four score variables instead of the one as well as four percent variables. Any suggestions would be most appreciated. |
|
Hi Miriam,
Try Data -> Restructure in the menu, then select Restructure selected cases into variables and follow the wizard. Do not forget to save the file before because it replaces the data. Regards Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Miriam Fultz, Ph.D. Sent: Monday, August 27, 2007 4:01 AM To: [hidden email] Subject: Combining Cases into Single Row Dear Listers: I'm sure this is a truly simple problem, but alas, I am running out of time... I have a dataset in which there are multiple rows for some of the cases and one row for the others. For the cases with multiple rows, there are only actual data differences for just three variables: Score, percent, and testtype. The variables are defined as follows: 1) Score = science achievement score in one of four content/grade level tests 2) Percent = associate with the test score 3) Testtype = identifies whether the testtype is 1=physical sci/fourth grade; 2=physical science/fifth grade; 3=earth science/fourth grade; 4=earth science/fifth grade. If a student completed two tests (which they legitimately could), then they have two rows. About half of the cases took one test (so just one row). I found several cases who completed three tests (three rows for those cases). Here is an example of the data (there are also demographic variables that come after ID1 and the values of these across the rows for a case are the same): Dist School V3 V4 V5 TESTTYPE V7 Score Percent Lastname Firstname ID1 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 So, the first four lines represent two cases with two rows per case and the last is a person who just took one of the four tests. All identifying information is fictitious. What I need is to have one row per case. It seems to me that I would need to have four score variables instead of the one as well as four percent variables. Any suggestions would be most appreciated. _____ Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- |
|
In reply to this post by Miriam Fultz, Ph.D.
At 10:00 PM 8/26/2007, Miriam Fultz, Ph.D. wrote:
>I have a dataset in which there are multiple rows for some of the >cases and one row for the others. For the cases with multiple rows, >there are only actual data differences for just three variables: >Score, percent, and testtype. The variables are defined as follows: > >1) Score = science achievement score in one of four content/grade >level tests > >2) Percent = associate with the test score > >3) Testtype = identifies whether the testtype is 1=physical sci/fourth >grade; 2=physical science/fifth grade; 3=earth science/fourth grade; >4=earth science/fifth grade. > >If a student completed two tests (which they legitimately could), then >they have two rows. About half of the cases took one test (so just >one row). I found several cases who completed three tests (three rows >for those cases). > >Here is an example of the data (there are also demographic variables >that >come after ID1 and the values of these across the rows for a case are >the >same): |-----------------------------|---------------------------| |Output Created |27-AUG-2007 15:27:01 | |-----------------------------|---------------------------| Dist School V3 V4 V5 TEST V7 Score Percent Lastname Firstname ID1 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 Number of cases read: 5 Number of cases listed: 5 >So, the first four lines represent two cases with two rows per case >and the last is a person who just took one of the four tests. > >I need to have one row per case. It seems to me that I would need >four score variables instead of the one as well as four percent >variables. Indeed, so. Below is SPSS draft output to get that result. (Jan's suggestion is how you do it.) You'll find only 2 of each variable, because that's all their are in the test data, and CASESTOVARS creates only as many variables as needed for the data it sees. In your real data, with up to four records, it would create the four sets. Now, as a recurring theme: Are you sure you want to do this? CASESTOVARS changes data from 'long' organization (separate records for each logical instance in each case) to 'wide' (multiple logical instances per record). Think what you're going to do next, and perhaps post a follow-up. Many manipulations and analyses are easier with data in 'long' form. That's said, here's the syntax and output. I clicked up the syntax entirely from the menus. SORT CASES BY Dist School ID1 TEST . CASESTOVARS /ID = Dist School ID1 /INDEX = TEST /GROUPBY = INDEX . Notes |----------------------------|---------------------------| |Output Created |27-AUG-2007 15:27:01 | |----------------------------|---------------------------| Generated Variables |--------|----|---------| |Original|TEST|Result | |Variable| |---------| | | |Name | |--------|----|---------| |Score |2 |Score.2 | | |4 |Score.4 | |--------|----|---------| |Percent |2 |Percent.2| | |4 |Percent.4| |--------|----|---------| Processing Statistics |---------------|---| |Cases In |5 | |Cases Out |3 | |---------------|---| |Cases In/Cases |1.7| |Out | | |---------------|---| |Variables In |12 | |Variables Out |13 | |---------------|---| |Index Values |2 | |---------------|---| TEMPORARY. STRING SPACE (A5). LIST DIST to Percent.4 SPACE. List |-----------------------------|---------------------------| |Output Created |27-AUG-2007 15:27:01 | |-----------------------------|---------------------------| Di Sch Firstnam Scor Percent Scor Percent st ool ID1 V3 V4 V5 V7 Lastname e e.2 .2 e.4 .4 SPACE 3 18 0000002222 32 5 3 1 Smith John 10 41.67 9 37.50 3 18 0000003333 32 5 3 1 Jones Mary 15 62.50 13 54.17 3 18 0000004444 38 5 3 1 Brown Robert 6 25.00 . . Number of cases read: 3 Number of cases listed: 3 =================== APPENDIX: Test data =================== DATA LIST LIST SKIP=2/ Dist School V3 V4 V5 TEST V7 Score Percent Lastname Firstname ID1 (F2 F3 3(F2) F2 F2 F4 F6.2 2(A8) N10). BEGIN DATA Dist School V3 V4 V5 TEST V7 Score Percent Lastname Firstname ID1 Type 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 END DATA. FORMATS ID1 (N10). LIST. |
|
At 01:49 PM 8/28/2007, Miriam Fultz, Ph.D. wrote, off-list (I'm
replying on-list, by custom): >My immediate need is to determine the number of students who took both >the pre and post tests (for reporting purposes). The client has >defined "match" as a student who took the pretest and any one posttest >(the desired number of posttests completed is two). What would you >recommend in order to determine the total number of pre-post matches >-- AGGREGATE? Exactly. AGGREGATE is a great friend; see below. >Ultimately, I will conduct an MLM (HLM) analysis (but depending upon >the timing of the report, perhaps an ANCOVA per the desires of the >client). If I leave the file in a long form (which would facilitate >the MLM), I also want to know the number of have pre-one post; pre-two >posttests; pre -- three posttest scores. The following is SPSS 15 draft output (WRR: not saved separately). Test data is as in previous post, except shortening some names and formats to give shorter printed output lines. I don't know what marks a pre-test vs. a post-test. The following takes test types 1 and 2 as pre-test, and 3 and 4 as post-test; but that's easy to change. |-----------------------------|---------------------------| |Output Created |28-AUG-2007 16:34:15 | |-----------------------------|---------------------------| Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 Number of cases read: 5 Number of cases listed: 5 TEMPORARY. NUMERIC PreTest PostTest (F2). IF ANY (TEST,1,2) PreTest = 1. IF ANY (TEST,3,4) PostTest = 1. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK = Dist School ID1 /NPre 'Number of pre-tests' = SUM(PreTest) /NPost 'Number of post-tests' = SUM(PostTest). RECODE NPre NPost (SYSMIS = 0). FORMATS NPre NPost (F2). TEMPORARY. STRING SPACE (A10). LIST Dist TO NPost SPACE. List |-----------------------------|---------------------------| |Output Created |28-AUG-2007 16:37:27 | |-----------------------------|---------------------------| Di Sch TE Scor NP st ool V3 V4 V5 ST V7 e Percent LName FName ID1 re NPost SPACE 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 1 1 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 1 1 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 1 1 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 1 1 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 1 0 Number of cases read: 5 Number of cases listed: 5 =================== APPENDIX: Test data =================== (Identical to in previous post, except shortening names and formats to give shorter printed output lines.) DATA LIST LIST SKIP=2/ Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1 (F2 F3 3(F2) F2 F2 F4 F6.2 2(A6) N10). BEGIN DATA Dist School V3 V4 V5 TEST V7 Score Percent LName FName ID1 Type 3 18 32 5 3 4 1 9 37.50 Smith John 0000002222 3 18 32 5 3 2 1 10 41.67 Smith John 0000002222 3 18 32 5 3 4 1 13 54.17 Jones Mary 0000003333 3 18 32 5 3 2 1 15 62.50 Jones Mary 0000003333 3 18 38 5 3 2 1 6 25.00 Brown Robert 0000004444 END DATA. FORMATS ID1 (N10). LIST. |
| Free forum by Nabble | Edit this page |
