SPSSX Discussion

How to restructure financial aid data file?

Classic

List

Threaded

8 messages Options

Jake Gross

How to restructure financial aid data file?

Hello all,

I am trying to find a quick way to restructure several large data files with
information about student financial aid awards. Aid data are included in the
files for two semesters with students appearing 1-9 times depending on how
many awards they received. The structure is roughly the following.

DATA LIST LIST / ID Award_Type Semester1 Semester2
BEGIN DATA.
1 Stafford 500 500
1 Academic 456 234
1 Pell 300 345
2 Pell 344 345
2 Stafford 500 678
3 Stafford 467 345
4 Merit 500 500
4 Academic 567 578
END DATA.

I would like the file to appear something like the following, with the
Award_Type variable serving as the variable label for each new variable and
each student appearing only once in the file (e.g.,Semester1Award1):

ID Semester1Award1 Semester1Award2 Semester1Award3
Semester2Award2

I have tried the following approach, but have been unable to find a way to
restructure the file to retain information about the award type.

COMPUTE caseid=$casenum.
RANK VARIABLES= caseid by id
/RANK INTO case_no.
LIST.
MATCH FILES FILE=*
/BY id
/LAST=last.
VECTOR award (10F8.0).
COMPUTE award (case_no)=Semester1.
AGGREGATE
/OUTFILE=*
/BREAK=studentid
/Award1 to Award10 = MAX (Award1 to Award10).
EXE.

I know there are several brute force approaches (which I have tried and do
work), but with 100 different award types in some of the files it becomes
time consuming and just seems pretty inelegant.

Any help would be appreciated!

Peace,
Jake

--
Jacob PK Gross
Project on Academic Success-- Associate Director for Research
Indiana University
2805 E. Tenth Street
Smith Research Center, Suite 100
Bloomington, IN 47408
Phone: 812-855-1604
Fax: 812-855-7422

Beadle, ViAnn

Re: How to restructure financial aid data file?

Have you retried Restructure File on the Data command? After fixing up your data list command (formats--since you've got a string lurking here and command terminator), and running through the restructure wizard, here's the syntax:

DATA LIST LIST / ID (f5) Award_Type (a8) Semester1 (f5) Semester2 (f5) .
BEGIN DATA.
1 Stafford 500 500
1 Academic 456 234
1 Pell 300 345
2 Pell 344 345
2 Stafford 500 678
3 Stafford 467 345
4 Merit 500 500
4 Academic 567 578
END DATA.
SAVE OUTFILE='C:\Program Files\SPSS\awards.sav'
/COMPRESSED.
CASESTOVARS
/ID = ID
/GROUPBY = VARIABLE .

Try it yourself to see if this meets your need.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jake Gross
Sent: Friday, February 02, 2007 8:09 AM
To: [hidden email]
Subject: How to restructure financial aid data file?

Hello all,

I am trying to find a quick way to restructure several large data files with
information about student financial aid awards. Aid data are included in the
files for two semesters with students appearing 1-9 times depending on how
many awards they received. The structure is roughly the following.

DATA LIST LIST / ID Award_Type Semester1 Semester2
BEGIN DATA.
1 Stafford 500 500
1 Academic 456 234
1 Pell 300 345
2 Pell 344 345
2 Stafford 500 678
3 Stafford 467 345
4 Merit 500 500
4 Academic 567 578
END DATA.

I would like the file to appear something like the following, with the
Award_Type variable serving as the variable label for each new variable and
each student appearing only once in the file (e.g.,Semester1Award1):

ID Semester1Award1 Semester1Award2 Semester1Award3
Semester2Award2

I have tried the following approach, but have been unable to find a way to
restructure the file to retain information about the award type.

COMPUTE caseid=$casenum.
RANK VARIABLES= caseid by id
/RANK INTO case_no.
LIST.
MATCH FILES FILE=*
/BY id
/LAST=last.
VECTOR award (10F8.0).
COMPUTE award (case_no)=Semester1.
AGGREGATE
/OUTFILE=*
/BREAK=studentid
/Award1 to Award10 = MAX (Award1 to Award10).
EXE.

I know there are several brute force approaches (which I have tried and do
work), but with 100 different award types in some of the files it becomes
time consuming and just seems pretty inelegant.

Any help would be appreciated!

Peace,
Jake

--
Jacob PK Gross
Project on Academic Success-- Associate Director for Research
Indiana University
2805 E. Tenth Street
Smith Research Center, Suite 100
Bloomington, IN 47408
Phone: 812-855-1604
Fax: 812-855-7422

Richard Ristow

Re: How to restructure financial aid data file?

In reply to this post by Jake Gross

At 09:08 AM 2/2/2007, Jake Gross wrote:

>I am trying to find a quick way to restructure several large data
>files. Aid data are included in the files for two semesters with
>students appearing 1-9 times depending on how many awards they
>received. The structure is roughly the following.
>
>DATA LIST LIST / ID Award_Type Semester1 Semester2
>BEGIN DATA.
>1 Stafford 500 500
>1 Academic 456 234
>1 Pell 300 345
>2 Pell 344 345
>2 Stafford 500 678
>3 Stafford 467 345
>4 Merit 500 500
>4 Academic 567 578
>END DATA.
>
>I would like the file to appear something like the following, with the
>Award_Type variable serving as the variable label for each new
>variable and each student appearing only once in the file
>(e.g.,Semester1Award1):
>
>ID Semester1Award1 Semester1Award2
>Semester1Award3 Semester2Award1

FIRST (and I'm almost as much a stuck record on this, as on EXECUTE),
strongly consider not doing it at all. 'Long' organization (which is
what you have now) is more flexible for most purposes than is 'Wide'
organization (what you're looking for). This depends on what you want
to do, of course. But look once more at what you want to do. The
restructuring is a means to that end; in many cases, there are other
means that work better.

SECOND, at 10:00 AM 2/2/2007, Beadle, ViAnn wrote:

>Have you retried Restructure File on the Data command?

That is always good advice in such cases, though I'd phrase it as "Have
you tried CasesToVars, which is available through Data>Restructure File
on the menus?" That is, think syntax as fundamental, with the menus as
aids to creating the syntax.

THIRD, the results of her syntax may not quite be what you want. Try
this, with VECTOR/AGGREGATE logic as I think you can do it with
VECTOR/AGGREGATE logic, as you were trying. The following logic is
tested; SPSS 14 draft output. Using AGGREGATE function SUM instead of
MAX handles correctly multiple awards of the same type, for the same
student and semester. (WRR: Code & output not kept separately.)

DATASET ACTIVATE AwardsLong.
DATASET COPY Wide_WRR.
DATASET ACTIVATE Wide_WRR.

VECTOR Sem1Awrd (5,F5)
/Sem2Awrd (5,F5).

RECODE Award_Type
('Stafford' = 1)
('Academic' = 2)
('Pell' = 3)
('Merit' = 4)
(ELSE = 5)
INTO #TypeIdx.

COMPUTE Sem1Awrd(#TypeIdx) = Semester1.
COMPUTE Sem2Awrd(#TypeIdx) = Semester2.

AGGREGATE OUTFILE=*
/BREAK=ID
/Sem1Awrd1 = SUM(Sem1Awrd1)
/Sem1Awrd2 = SUM(Sem1Awrd2)
/Sem1Awrd3 = SUM(Sem1Awrd3)
/Sem1Awrd4 = SUM(Sem1Awrd4)
/Sem1Awrd5 = SUM(Sem1Awrd5)
/Sem2Awrd1 = SUM(Sem2Awrd1)
/Sem2Awrd2 = SUM(Sem2Awrd2)
/Sem2Awrd3 = SUM(Sem2Awrd3)
/Sem2Awrd4 = SUM(Sem2Awrd4)
/Sem2Awrd5 = SUM(Sem2Awrd5).

FORMATS Sem1Awrd1 TO Sem2Awrd5 (F5).

VAR LABELS
Sem1Awrd1 'Semester 1 Stafford award'
Sem1Awrd2 'Semester 1 Academic award'
Sem1Awrd3 'Semester 1 Pell award'
Sem1Awrd4 'Semester 1 Merit award'
Sem1Awrd5 'Semester 1 UNKNOWN award'
Sem2Awrd1 'Semester 2 Stafford award'
Sem2Awrd2 'Semester 2 Academic award'
Sem2Awrd3 'Semester 2 Pell award'
Sem2Awrd4 'Semester 2 Merit award'
Sem2Awrd5 'Semester 2 UNKNOWN award'.

TEMPORARY.
STRING SPACE(A11).
LIST.

List
|-----------------------------|---------------------------|
|Output Created |02-FEB-2007 12:46:40 |
|-----------------------------|---------------------------|
Sem1A Sem1A Sem1A Sem1A Sem1A Sem2A Sem2A Sem2A Sem2A Sem2A
ID
wrd1 wrd2 wrd3 wrd4 wrd5 wrd1 wrd2 wrd3 wrd4 wrd5 SPACE

1 500 456 300 . . 500 234 345 . .
2 500 . 344 . . 678 . 345 . .
3 467 . . . . 345 . . . .
4 . 567 . 500 . . 578 . 500 .

Number of cases read: 4 Number of cases listed: 4

FOURTH, you write,

>with 100 different award types in some of the files it becomes time
>consuming and just seems pretty inelegant.

With 100 different award types, this approach gives you 200 data
variables: one for each possible award type, for each semester. That's
clumsy to work with.

You could also do your 'wide' organization as pairs of variables, like
this which allows up to three awards per student per semester (it can
be expanded to allow the 9 you want, with 36 variables: type and amount
for each of 9 awards, for each of two semesters):

ID Sem1Typ1 Sem1Amt1 Sem1Typ2 Sem1Amt2 Sem1Typ3 Sem1Amt3...

FORMATS
Sem1Typ1 Sem1Typ2 Sem1Typ3
Sem2Typ1 Sem2Typ2 Sem2Typ3
(F3).
VALUE LABELS
Sem1Typ1 Sem1Typ2 Sem1Typ3
Sem2Typ1 Sem2Typ2 Sem2Typ3
001 'Stafford'
002 'Academic'
003 'Pell'
004 'Merit'
...
FORMATS
Sem1Amt1 Sem1Amt2 Sem1Amt3
Sem2Amt1 Sem2Amt2 Sem2Amt3
(F5).

But that form is even clumsier.
===================
APPENDIX: Test data
===================
From ViAnn Beadle's posting:

DATA LIST LIST
/ ID (f5) Award_Type (a8) Semester1 (f5) Semester2 (f5) .
BEGIN DATA.
1 Stafford 500 500
1 Academic 456 234
1 Pell 300 345
2 Pell 344 345
2 Stafford 500 678
3 Stafford 467 345
4 Merit 500 500
4 Academic 567 578
END DATA.
*...SAVE OUTFILE='C:\Program Files\SPSS\awards.sav'
/COMPRESSED.
DATASET NAME AwardsLong.

Chen Wei-2

Recoding

In reply to this post by Beadle, ViAnn

Hi,

I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much.

Wayne

**************************
*Example 1

if(age=15)age15=1.
if(age~=15)age15=0.

if(age=16)age16=1.
if(age~=16)age16=0.
......
if(age=49)age49=1.
if(age~=49)age49=0.
**************************
*Example 2

do if yage15<2000.
recode marry15 to marry49 (0=1).
end if.
execute.

do if yage16<2000.
recode marry16 to marry49 (0=1).
end if.
execute.
......
do if yage49<2000.
recode marry49 (0=1).
end if.
execute.
**************************
*Example 3

do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)).
recode child15 (0=1).
end if.
execute.

do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)).
recode child16 (0=1).
end if.
execute.
......
do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)).
recode child49 (0=1).
end if.
execute.
***************************

Oliver, Richard

Re: Recoding

Assuming age is an integer between 15 and 49 inclusive, and you want new variables for all the possible age values, you could solve the first problems with DO REPEAT:

do repeat
x=age15 to age49
/y=15 to 49.
compute x=(age=y).
end repeat.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chen Wei
Sent: Friday, February 02, 2007 12:21 PM
To: [hidden email]
Subject: Recoding

Hi,

I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much.

Wayne

**************************
*Example 1

if(age=15)age15=1.
if(age~=15)age15=0.

if(age=16)age16=1.
if(age~=16)age16=0.
......
if(age=49)age49=1.
if(age~=49)age49=0.
**************************
*Example 2

do if yage15<2000.
recode marry15 to marry49 (0=1).
end if.
execute.

do if yage16<2000.
recode marry16 to marry49 (0=1).
end if.
execute.
......
do if yage49<2000.
recode marry49 (0=1).
end if.
execute.
**************************
*Example 3

do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)).
recode child15 (0=1).
end if.
execute.

do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)).
recode child16 (0=1).
end if.
execute.
......
do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)).
recode child49 (0=1).
end if.
execute.
***************************

Bauer, John H.

Re: Recoding

In reply to this post by Chen Wei-2

This is very easy to do if you have the Programmability Extension installed. Note that it is easier to use
"Compute age16 = (age=16) ." instead of a pair of IF statements.

I have left Example 3 as an exercise for the List. Read the article "Converting an SPSS Syntax Job to a Python Job" at

http://www.spss.com/devcentral/index.cfm?pg=articleDet&artId=5

to get started; follow up with the Programming and Data Management book chapters 13 and 20:

http://www.spss.com/spss/data_management_book.htm
http://www.spss.com/spss/SPSSdatamgmt_4e.pdf

It took me less time to write and test the code below than it did to write this e-mail :-), so I hope some of you will find the effort worthwhile.

begin program python.
import spss

for age in xrange(16, 50):
spss.Submit("compute age%(age)s = (age=%(age)s)." % locals())
spss.Submit("execute.")
end program.

begin program python.
import spss

cmds = """
do if yage%(age)s<2000.
recode marry%(age)s to marry49 (0=1).
end if.
"""

for age in xrange(16, 50):
spss.Submit(cmds % locals())
spss.Submit("execute.")
end program.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chen Wei
Sent: Friday, February 02, 2007 12:21 PM
To: [hidden email]
Subject: Recoding

Hi,

I am a biginner in using SPSS, I am creating dummy variables and recoding some categorical variables. These are easy to do with the following syntax, but I will need to repeat 35 times doing the same thing. Could anyone sugest how to use Vector and Loop Structure or Macro to do the job? Thanks very much.

Wayne

**************************
*Example 1

if(age=15)age15=1.
if(age~=15)age15=0.

if(age=16)age16=1.
if(age~=16)age16=0.
......
if(age=49)age49=1.
if(age~=49)age49=0.
**************************
*Example 2

do if yage15<2000.
recode marry15 to marry49 (0=1).
end if.
execute.

do if yage16<2000.
recode marry16 to marry49 (0=1).
end if.
execute.
......
do if yage49<2000.
recode marry49 (0=1).
end if.
execute.
**************************
*Example 3

do if ((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)).
recode child15 (0=1).
end if.
execute.

do if ((yage16=par1)|(yage16=par2)|(yage16=par3)|(yage16=par4)|(yage16=par5)|(yage16=par6)|(yage16=par7)|(yage16=par8)).
recode child16 (0=1).
end if.
execute.
......
do if ((yage49=par1)|(yage49=par2)|(yage49=par3)|(yage49=par4)|(yage49=par5)|(yage49=par6)|(yage49=par7)|(yage49=par8)).
recode child49 (0=1).
end if.
execute.
***************************

Albert-Jan Roskam

Re: Recoding

In reply to this post by Chen Wei-2

Hi Wayne,

Example 1 (untested!):
VECTOR age(49). /* creation of 49 empty age vars.
VECTOR v_age=age1 TO age49. /* definition of
vector/array 'v_age'.
LOOP #I=1 TO 49.
+ IF (age = #I) v_age(#I) = 1.
+ IF (age ne #I) v_age(#I) = 0.
END LOOP.
EXE.

--> as you mentioned already, a macro loop would also
be an option. Are you sure you want all values not
being #i to be recoded to 0 -- INCLUDING MISSING
VALUES? If so, you could make it easier just by
replacing the IF rules with:
COUNT v_age(#I) = age (#I).

If not, just add a rule that retains the original
missing values (user and/or system missing values).

Cheers!
Albert-Jan

--- Chen Wei <[hidden email]> wrote:

> Hi,
>
> I am a biginner in using SPSS, I am creating dummy
> variables and recoding some categorical variables.
> These are easy to do with the following syntax, but
> I will need to repeat 35 times doing the same thing.
> Could anyone sugest how to use Vector and Loop
> Structure or Macro to do the job? Thanks very much.
>
> Wayne
>
> **************************
> *Example 1
>
> if(age=15)age15=1.
> if(age~=15)age15=0.
>
> if(age=16)age16=1.
> if(age~=16)age16=0.
> ......
> if(age=49)age49=1.
> if(age~=49)age49=0.
> **************************
> *Example 2
>
> do if yage15<2000.
> recode marry15 to marry49 (0=1).
> end if.
> execute.
>
> do if yage16<2000.
> recode marry16 to marry49 (0=1).
> end if.
> execute.
> ......
> do if yage49<2000.
> recode marry49 (0=1).
> end if.
> execute.
> **************************
> *Example 3
>
> do if
>

Peck, Jon

Re: Recoding

Note that using the ANY function will greatly simplify the test criteria in example 3.
Change
do if
((yage15=par1)|(yage15=par2)|(yage15=par3)|(yage15=par4)|(yage15=par5)|(yage15=par6)|(yage15=par7)|(yage15=par8)).

into
do if any(ya15, par1 to par8).

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-jan Roskam
Sent: Saturday, February 03, 2007 4:39 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Recoding

Hi Wayne,

Example 1 (untested!):
VECTOR age(49). /* creation of 49 empty age vars.
VECTOR v_age=age1 TO age49. /* definition of
vector/array 'v_age'.
LOOP #I=1 TO 49.
+ IF (age = #I) v_age(#I) = 1.
+ IF (age ne #I) v_age(#I) = 0.
END LOOP.
EXE.

--> as you mentioned already, a macro loop would also
be an option. Are you sure you want all values not
being #i to be recoded to 0 -- INCLUDING MISSING
VALUES? If so, you could make it easier just by
replacing the IF rules with:
COUNT v_age(#I) = age (#I).

If not, just add a rule that retains the original
missing values (user and/or system missing values).

Cheers!
Albert-Jan

--- Chen Wei <[hidden email]> wrote: