Hello all,
I am trying to find a quick way to restructure several large data files with information about student financial aid awards. Aid data are included in the files for two semesters with students appearing 1-9 times depending on how many awards they received. The structure is roughly the following. DATA LIST LIST / ID Award_Type Semester1 Semester2 BEGIN DATA. 1 Stafford 500 500 1 Academic 456 234 1 Pell 300 345 2 Pell 344 345 2 Stafford 500 678 3 Stafford 467 345 4 Merit 500 500 4 Academic 567 578 END DATA. I would like the file to appear something like the following, with the Award_Type variable serving as the variable label for each new variable and each student appearing only once in the file (e.g.,Semester1Award1): ID Semester1Award1 Semester1Award2 Semester1Award3 Semester2Award2 I have tried the following approach, but have been unable to find a way to restructure the file to retain information about the award type. COMPUTE caseid=$casenum. RANK VARIABLES= caseid by id /RANK INTO case_no. LIST. MATCH FILES FILE=* /BY id /LAST=last. VECTOR award (10F8.0). COMPUTE award (case_no)=Semester1. AGGREGATE /OUTFILE=* /BREAK=studentid /Award1 to Award10 = MAX (Award1 to Award10). EXE. I know there are several brute force approaches (which I have tried and do work), but with 100 different award types in some of the files it becomes time consuming and just seems pretty inelegant. Any help would be appreciated! Peace, Jake -- Jacob PK Gross Project on Academic Success-- Associate Director for Research Indiana University 2805 E. Tenth Street Smith Research Center, Suite 100 Bloomington, IN 47408 Phone: 812-855-1604 Fax: 812-855-7422 |
Have you retried Restructure File on the Data command? After fixing up your data list command (formats--since you've got a string lurking here and command terminator), and running through the restructure wizard, here's the syntax:
DATA LIST LIST / ID (f5) Award_Type (a8) Semester1 (f5) Semester2 (f5) . BEGIN DATA. 1 Stafford 500 500 1 Academic 456 234 1 Pell 300 345 2 Pell 344 345 2 Stafford 500 678 3 Stafford 467 345 4 Merit 500 500 4 Academic 567 578 END DATA. SAVE OUTFILE='C:\Program Files\SPSS\awards.sav' /COMPRESSED. CASESTOVARS /ID = ID /GROUPBY = VARIABLE . Try it yourself to see if this meets your need. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jake Gross Sent: Friday, February 02, 2007 8:09 AM To: [hidden email] Subject: How to restructure financial aid data file? Hello all, I am trying to find a quick way to restructure several large data files with information about student financial aid awards. Aid data are included in the files for two semesters with students appearing 1-9 times depending on how many awards they received. The structure is roughly the following. DATA LIST LIST / ID Award_Type Semester1 Semester2 BEGIN DATA. 1 Stafford 500 500 1 Academic 456 234 1 Pell 300 345 2 Pell 344 345 2 Stafford 500 678 3 Stafford 467 345 4 Merit 500 500 4 Academic 567 578 END DATA. I would like the file to appear something like the following, with the Award_Type variable serving as the variable label for each new variable and each student appearing only once in the file (e.g.,Semester1Award1): ID Semester1Award1 Semester1Award2 Semester1Award3 Semester2Award2 I have tried the following approach, but have been unable to find a way to restructure the file to retain information about the award type. COMPUTE caseid=$casenum. RANK VARIABLES= caseid by id /RANK INTO case_no. LIST. MATCH FILES FILE=* /BY id /LAST=last. VECTOR award (10F8.0). COMPUTE award (case_no)=Semester1. AGGREGATE /OUTFILE=* /BREAK=studentid /Award1 to Award10 = MAX (Award1 to Award10). EXE. I know there are several brute force approaches (which I have tried and do work), but with 100 different award types in some of the files it becomes time consuming and just seems pretty inelegant. Any help would be appreciated! Peace, Jake -- Jacob PK Gross Project on Academic Success-- Associate Director for Research Indiana University 2805 E. Tenth Street Smith Research Center, Suite 100 Bloomington, IN 47408 Phone: 812-855-1604 Fax: 812-855-7422 |
In reply to this post by Jake Gross
At 09:08 AM 2/2/2007, Jake Gross wrote:
>I am trying to find a quick way to restructure several large data >files. Aid data are included in the files for two semesters with >students appearing 1-9 times depending on how many awards they >received. The structure is roughly the following. > >DATA LIST LIST / ID Award_Type Semester1 Semester2 >BEGIN DATA. >1 Stafford 500 500 >1 Academic 456 234 >1 Pell 300 345 >2 Pell 344 345 >2 Stafford 500 678 >3 Stafford 467 345 >4 Merit 500 500 >4 Academic 567 578 >END DATA. > >I would like the file to appear something like the following, with the >Award_Type variable serving as the variable label for each new >variable and each student appearing only once in the file >(e.g.,Semester1Award1): > >ID Semester1Award1 Semester1Award2 >Semester1Award3 Semester2Award1 FIRST (and I'm almost as much a stuck record on this, as on EXECUTE), strongly consider not doing it at all. 'Long' organization (which is what you have now) is more flexible for most purposes than is 'Wide' organization (what you're looking for). This depends on what you want to do, of course. But look once more at what you want to do. The restructuring is a means to that end; in many cases, there are other means that work better. SECOND, at 10:00 AM 2/2/2007, Beadle, ViAnn wrote: >Have you retried Restructure File on the Data command? That is always good advice in such cases, though I'd phrase it as "Have you tried CasesToVars, which is available through Data>Restructure File on the menus?" That is, think syntax as fundamental, with the menus as aids to creating the syntax. THIRD, the results of her syntax may not quite be what you want. Try this, with VECTOR/AGGREGATE logic as I think you can do it with VECTOR/AGGREGATE logic, as you were trying. The following logic is tested; SPSS 14 draft output. Using AGGREGATE function SUM instead of MAX handles correctly multiple awards of the same type, for the same student and semester. (WRR: Code & output not kept separately.) DATASET ACTIVATE AwardsLong. DATASET COPY Wide_WRR. DATASET ACTIVATE Wide_WRR. VECTOR Sem1Awrd (5,F5) /Sem2Awrd (5,F5). RECODE Award_Type ('Stafford' = 1) ('Academic' = 2) ('Pell' = 3) ('Merit' = 4) (ELSE = 5) INTO #TypeIdx. COMPUTE Sem1Awrd(#TypeIdx) = Semester1. COMPUTE Sem2Awrd(#TypeIdx) = Semester2. AGGREGATE OUTFILE=* /BREAK=ID /Sem1Awrd1 = SUM(Sem1Awrd1) /Sem1Awrd2 = SUM(Sem1Awrd2) /Sem1Awrd3 = SUM(Sem1Awrd3) /Sem1Awrd4 = SUM(Sem1Awrd4) /Sem1Awrd5 = SUM(Sem1Awrd5) /Sem2Awrd1 = SUM(Sem2Awrd1) /Sem2Awrd2 = SUM(Sem2Awrd2) /Sem2Awrd3 = SUM(Sem2Awrd3) /Sem2Awrd4 = SUM(Sem2Awrd4) /Sem2Awrd5 = SUM(Sem2Awrd5). FORMATS Sem1Awrd1 TO Sem2Awrd5 (F5). VAR LABELS Sem1Awrd1 'Semester 1 Stafford award' Sem1Awrd2 'Semester 1 Academic award' Sem1Awrd3 'Semester 1 Pell award' Sem1Awrd4 'Semester 1 Merit award' Sem1Awrd5 'Semester 1 UNKNOWN award' Sem2Awrd1 'Semester 2 Stafford award' Sem2Awrd2 'Semester 2 Academic award' Sem2Awrd3 'Semester 2 Pell award' Sem2Awrd4 'Semester 2 Merit award' Sem2Awrd5 'Semester 2 UNKNOWN award'. TEMPORARY. STRING SPACE(A11). LIST. List |-----------------------------|---------------------------| |Output Created |02-FEB-2007 12:46:40 | |-----------------------------|---------------------------| Sem1A Sem1A Sem1A Sem1A Sem1A Sem2A Sem2A Sem2A Sem2A Sem2A ID wrd1 wrd2 wrd3 wrd4 wrd5 wrd1 wrd2 wrd3 wrd4 wrd5 SPACE 1 500 456 300 . . 500 234 345 . . 2 500 . 344 . . 678 . 345 . . 3 467 . . . . 345 . . . . 4 . 567 . 500 . . 578 . 500 . Number of cases read: 4 Number of cases listed: 4 FOURTH, you write, >with 100 different award types in some of the files it becomes time >consuming and just seems pretty inelegant. With 100 different award types, this approach gives you 200 data variables: one for each possible award type, for each semester. That's clumsy to work with. You could also do your 'wide' organization as pairs of variables, like this which allows up to three awards per student per semester (it can be expanded to allow the 9 you want, with 36 variables: type and amount for each of 9 awards, for each of two semesters): ID Sem1Typ1 Sem1Amt1 Sem1Typ2 Sem1Amt2 Sem1Typ3 Sem1Amt3... FORMATS Sem1Typ1 Sem1Typ2 Sem1Typ3 Sem2Typ1 Sem2Typ2 Sem2Typ3 (F3). VALUE LABELS Sem1Typ1 Sem1Typ2 Sem1Typ3 Sem2Typ1 Sem2Typ2 Sem2Typ3 001 'Stafford' 002 'Academic' 003 'Pell' 004 'Merit' ... FORMATS Sem1Amt1 Sem1Amt2 Sem1Amt3 Sem2Amt1 Sem2Amt2 Sem2Amt3 (F5). But that form is even clumsier. =================== APPENDIX: Test data =================== From ViAnn Beadle's posting: DATA LIST LIST / ID (f5) Award_Type (a8) Semester1 (f5) Semester2 (f5) . BEGIN DATA. 1 Stafford 500 500 1 Academic 456 234 1 Pell 300 345 2 Pell 344 345 2 Stafford 500 678 3 Stafford 467 345 4 Merit 500 500 4 Academic 567 578 END DATA. *...SAVE OUTFILE='C:\Program Files\SPSS\awards.sav' /COMPRESSED. DATASET NAME AwardsLong. |
In reply to this post by Beadle, ViAnn
Hi,
I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much. Wayne ************************** *Example 1 if(age=15)age15=1. if(age~=15)age15=0. if(age=16)age16=1. if(age~=16)age16=0. ...... if(age=49)age49=1. if(age~=49)age49=0. ************************** *Example 2 do if yage15<2000. recode marry15 to marry49 (0=1). end if. execute. do if yage16<2000. recode marry16 to marry49 (0=1). end if. execute. ...... do if yage49<2000. recode marry49 (0=1). end if. execute. ************************** *Example 3 do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)). recode child15 (0=1). end if. execute. do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)). recode child16 (0=1). end if. execute. ...... do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)). recode child49 (0=1). end if. execute. *************************** |
Assuming age is an integer between 15 and 49 inclusive, and you want new variables for all the possible age values, you could solve the first problems with DO REPEAT:
do repeat x=age15 to age49 /y=15 to 49. compute x=(age=y). end repeat. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chen Wei Sent: Friday, February 02, 2007 12:21 PM To: [hidden email] Subject: Recoding Hi, I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much. Wayne ************************** *Example 1 if(age=15)age15=1. if(age~=15)age15=0. if(age=16)age16=1. if(age~=16)age16=0. ...... if(age=49)age49=1. if(age~=49)age49=0. ************************** *Example 2 do if yage15<2000. recode marry15 to marry49 (0=1). end if. execute. do if yage16<2000. recode marry16 to marry49 (0=1). end if. execute. ...... do if yage49<2000. recode marry49 (0=1). end if. execute. ************************** *Example 3 do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)). recode child15 (0=1). end if. execute. do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)). recode child16 (0=1). end if. execute. ...... do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)). recode child49 (0=1). end if. execute. *************************** |
In reply to this post by Chen Wei-2
This is very easy to do if you have the Programmability Extension installed. Note that it is easier to use
"Compute age16 = (age=16) ." instead of a pair of IF statements. I have left Example 3 as an exercise for the List. Read the article "Converting an SPSS Syntax Job to a Python Job" at http://www.spss.com/devcentral/index.cfm?pg=articleDet&artId=5 to get started; follow up with the Programming and Data Management book chapters 13 and 20: http://www.spss.com/spss/data_management_book.htm http://www.spss.com/spss/SPSSdatamgmt_4e.pdf It took me less time to write and test the code below than it did to write this e-mail :-), so I hope some of you will find the effort worthwhile. begin program python. import spss for age in xrange(16, 50): spss.Submit("compute age%(age)s = (age=%(age)s)." % locals()) spss.Submit("execute.") end program. begin program python. import spss cmds = """ do if yage%(age)s<2000. recode marry%(age)s to marry49 (0=1). end if. """ for age in xrange(16, 50): spss.Submit(cmds % locals()) spss.Submit("execute.") end program. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chen Wei Sent: Friday, February 02, 2007 12:21 PM To: [hidden email] Subject: Recoding Hi, I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much. Wayne ************************** *Example 1 if(age=15)age15=1. if(age~=15)age15=0. if(age=16)age16=1. if(age~=16)age16=0. ...... if(age=49)age49=1. if(age~=49)age49=0. ************************** *Example 2 do if yage15<2000. recode marry15 to marry49 (0=1). end if. execute. do if yage16<2000. recode marry16 to marry49 (0=1). end if. execute. ...... do if yage49<2000. recode marry49 (0=1). end if. execute. ************************** *Example 3 do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)). recode child15 (0=1). end if. execute. do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)). recode child16 (0=1). end if. execute. ...... do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)). recode child49 (0=1). end if. execute. *************************** |
In reply to this post by Chen Wei-2
Hi Wayne,
Example 1 (untested!): VECTOR age(49). /* creation of 49 empty age vars. VECTOR v_age=age1 TO age49. /* definition of vector/array 'v_age'. LOOP #I=1 TO 49. + IF (age = #I) v_age(#I) = 1. + IF (age ne #I) v_age(#I) = 0. END LOOP. EXE. --> as you mentioned already, a macro loop would also be an option. Are you sure you want all values not being #i to be recoded to 0 -- INCLUDING MISSING VALUES? If so, you could make it easier just by replacing the IF rules with: COUNT v_age(#I) = age (#I). If not, just add a rule that retains the original missing values (user and/or system missing values). Cheers! Albert-Jan --- Chen Wei <[hidden email]> wrote: > Hi, > > I am a biginner in using SPSS, I am creating dummy > variables and recoding some categorical variables. > These are easy to do with the following syntax, but > I will need to repeat 35 times doing the same thing. > Could anyone sugest how to use Vector and Loop > Structure or Macro to do the job? Thanks very much. > > Wayne > > ************************** > *Example 1 > > if(age=15)age15=1. > if(age~=15)age15=0. > > if(age=16)age16=1. > if(age~=16)age16=0. > ...... > if(age=49)age49=1. > if(age~=49)age49=0. > ************************** > *Example 2 > > do if yage15<2000. > recode marry15 to marry49 (0=1). > end if. > execute. > > do if yage16<2000. > recode marry16 to marry49 (0=1). > end if. > execute. > ...... > do if yage49<2000. > recode marry49 (0=1). > end if. > execute. > ************************** > *Example 3 > > do if > > recode child15 (0=1). > end if. > execute. > > do if > ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)). > recode child16 (0=1). > end if. > execute. > ...... > do if > ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)). > recode child49 (0=1). > end if. > execute. > *************************** > ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail |
Note that using the ANY function will greatly simplify the test criteria in example 3.
Change do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)). into do if any(ya15, par1 to par8). HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-jan Roskam Sent: Saturday, February 03, 2007 4:39 AM To: [hidden email] Subject: Re: [SPSSX-L] Recoding Hi Wayne, Example 1 (untested!): VECTOR age(49). /* creation of 49 empty age vars. VECTOR v_age=age1 TO age49. /* definition of vector/array 'v_age'. LOOP #I=1 TO 49. + IF (age = #I) v_age(#I) = 1. + IF (age ne #I) v_age(#I) = 0. END LOOP. EXE. --> as you mentioned already, a macro loop would also be an option. Are you sure you want all values not being #i to be recoded to 0 -- INCLUDING MISSING VALUES? If so, you could make it easier just by replacing the IF rules with: COUNT v_age(#I) = age (#I). If not, just add a rule that retains the original missing values (user and/or system missing values). Cheers! Albert-Jan --- Chen Wei <[hidden email]> wrote: > Hi, > > I am a biginner in using SPSS, I am creating dummy > variables and recoding some categorical variables. > These are easy to do with the following syntax, but > I will need to repeat 35 times doing the same thing. > Could anyone sugest how to use Vector and Loop > Structure or Macro to do the job? Thanks very much. > > Wayne > > ************************** > *Example 1 > > if(age=15)age15=1. > if(age~=15)age15=0. > > if(age=16)age16=1. > if(age~=16)age16=0. > ...... > if(age=49)age49=1. > if(age~=49)age49=0. > ************************** > *Example 2 > > do if yage15<2000. > recode marry15 to marry49 (0=1). > end if. > execute. > > do if yage16<2000. > recode marry16 to marry49 (0=1). > end if. > execute. > ...... > do if yage49<2000. > recode marry49 (0=1). > end if. > execute. > ************************** > *Example 3 > > do if > > recode child15 (0=1). > end if. > execute. > > do if > ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)). > recode child16 (0=1). > end if. > execute. > ...... > do if > ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)). > recode child49 (0=1). > end if. > execute. > *************************** > ____________________________________________________________________________________ No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail |
Free forum by Nabble | Edit this page |