Restructure dataset with 3 demo vars into single var+ little tricky part

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Restructure dataset with 3 demo vars into single var+ little tricky part

Brandon Arnieri
Hello Folks,

I hope I have presented my problem sufficiently and I do apologize if
its not so clear.  I have perused www.spsstools.net but not exactly sure
which version applies to my situation.  Also, I am not sure how to
ensure the presence of levels of the demo vars in the case they did not
exist in particular Dataset A.  For example, ETHGRP is reported having 7
levels in Dataset B but only three exist in the this particular example.
So, I presume I would need to impute those levels nonexistent in Dataset
A as cases in restructured Dataset B.

Your time and assistance is greatly appreciated.

Have a great day,

Brandon
<>-barnxieriaxtgmaildxotcom-<>
remove x's in email

And, the problem:

I am attempting to restructure Dataset A below (var names in first line)
containing three demographic vars and four numeric counts of
enrollments.  Its shape is the following:

VALUE LABELS ETHGRP
'1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4'
'Filipino'
'5' 'Hispanic' '6' 'Other' '7' 'Declined to State'.
VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'.
VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5
'Unknown'.

ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
1,1,1,1,4,2,1
1,1,2,3,12,7,3
1,1,4,1,1,1,1
1,2,1,0,4,0,0
1,2,2,2,2,2,2
1,2,3,0,1,0,0
4,2,1,1,7,2,2
5,1,2,3,6,4,4
5,1,4,4,6,4,4
6,1,1,0,7,0,0

I would like obtain Dataset B below (var names in first line) by
collapsing the three demographic vars into a single var along with the
new corresponding enrollment counts.  However, its important that I
preserve all levels of each demo var even if it has not existed in
Dataset A.  The following is the restructured dataset I hope to arrive
at:

ALLDEMO,SUCC,TOT,RCVD,COMP
11,7,24,12,7
12,0,0,0,0
13,0,0,0,0
14,1,7,2,2
15,7,12,8,8
16,0,7,0,0
17,0,0,0,0
21,12,36,18,13
22,3,14,4,4
23,0,0,0,0
31,2,22,4,3
32,8,20,13,9
33,0,1,0,0
34,5,7,5,5
35,0,0,0,0
Reply | Threaded
Open this post in threaded view
|

Re: Restructure dataset with 3 demo vars into single var

Richard Ristow
At 02:10 PM 8/17/2006, Brandon Arnieri wrote:

>I hope I have presented my problem sufficiently and I do apologize if
>its not so clear.

Maybe somebody else has got this, in which case I apologize for being
thick, but I don't understand it. Could you explain more? You have

>ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
>1,1,1,1,4,2,1
>1,1,2,3,12,7,3
>1,1,4,1,1,1,1
>1,2,1,0,4,0,0
>1,2,2,2,2,2,2
>1,2,3,0,1,0,0
>4,2,1,1,7,2,2
>5,1,2,3,6,4,4
>5,1,4,4,6,4,4
>6,1,1,0,7,0,0

and you want

>ALLDEMO,SUCC,TOT,RCVD,COMP
>11,7,24,12,7
>12,0,0,0,0
>13,0,0,0,0
>14,1,7,2,2
>15,7,12,8,8
>16,0,7,0,0
>17,0,0,0,0
>21,12,36,18,13
>22,3,14,4,4
>23,0,0,0,0
>31,2,22,4,3
>32,8,20,13,9
>33,0,1,0,0
>34,5,7,5,5
>35,0,0,0,0
by
>collapsing the three demographic vars into a single var along with the
>new corresponding enrollment counts

Skipping for the moment "collapsing the three demographic vars into a
single var", what do you mean by "the new corresponding enrollment
counts"? For example, how are output rows
11,7,24,12,7
22,3,14,4,4
calculated from the input?

Where there do seem to be correspondences, if
14,1,7,2,2
comes from
4,2,1,1,7,2,2
how is 14 calculated from 4,2,1?

(OK, I'll probably figure it out as soon as I send this.)

Thanks, and good luck,
Richard
Reply | Threaded
Open this post in threaded view
|

Re: Restructure dataset with 3 demo vars into single var+ little tricky part

Brandon Arnieri
In reply to this post by Brandon Arnieri
ALLDEMO is a string concatanation, i j where i represents the i-th demo var at its j-value:  (i used numbers to preserve the order)
so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black),
and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian),
....
ALLDEMO =22 denotes demo var2 (gender) and its second level (male),
...
ALLDEMO =35 denotes demo var3 (gender) and its fifth level (unknown).

this probably isn't the best approach...however, this end result is just the most convenient for my reporting module.  sorry if i didn't make this clearer in my earlier post.

        -----Original Message-----
        From: Simon Freidin [mailto:[hidden email]]
        Sent: Thu 8/17/06 4:55 PM
        To: Brandon Arnieri
        Cc:
        Subject: Re: Restructure dataset with 3 demo vars into single var+ little tricky part



        explain how you create ALLDEMO from ETHGRP,SEX,AGEGRP  (to the list!) or
        you won't get any answers

        cheers
        Simon

        At 04:10 AM 18/08/2006, you wrote:
        >Hello Folks,
        >
        >I hope I have presented my problem sufficiently and I do apologize if
        >its not so clear.  I have perused www.spsstools.net but not exactly sure
        >which version applies to my situation.  Also, I am not sure how to
        >ensure the presence of levels of the demo vars in the case they did not
        >exist in particular Dataset A.  For example, ETHGRP is reported having 7
        >levels in Dataset B but only three exist in the this particular example.
        >So, I presume I would need to impute those levels nonexistent in Dataset
        >A as cases in restructured Dataset B.
        >
        >Your time and assistance is greatly appreciated.
        >
        >Have a great day,
        >
        >Brandon
        ><>-barnxieriaxtgmaildxotcom-<>
        >remove x's in email
        >
        >And, the problem:
        >
        >I am attempting to restructure Dataset A below (var names in first line)
        >containing three demographic vars and four numeric counts of
        >enrollments.  Its shape is the following:
        >
        >VALUE LABELS ETHGRP
        >'1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4'
        >'Filipino'
        >'5' 'Hispanic' '6' 'Other' '7' 'Declined to State'.
        >VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'.
        >VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5
        >'Unknown'.
        >
        >ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
        >1,1,1,1,4,2,1
        >1,1,2,3,12,7,3
        >1,1,4,1,1,1,1
        >1,2,1,0,4,0,0
        >1,2,2,2,2,2,2
        >1,2,3,0,1,0,0
        >4,2,1,1,7,2,2
        >5,1,2,3,6,4,4
        >5,1,4,4,6,4,4
        >6,1,1,0,7,0,0
        >
        >I would like obtain Dataset B below (var names in first line) by
        >collapsing the three demographic vars into a single var along with the
        >new corresponding enrollment counts.  However, its important that I
        >preserve all levels of each demo var even if it has not existed in
        >Dataset A.  The following is the restructured dataset I hope to arrive
        >at:
        >
        >ALLDEMO,SUCC,TOT,RCVD,COMP
        >11,7,24,12,7
        >12,0,0,0,0
        >13,0,0,0,0
        >14,1,7,2,2
        >15,7,12,8,8
        >16,0,7,0,0
        >17,0,0,0,0
        >21,12,36,18,13
        >22,3,14,4,4
        >23,0,0,0,0
        >31,2,22,4,3
        >32,8,20,13,9
        >33,0,1,0,0
        >34,5,7,5,5
        >35,0,0,0,0


        Research Database Manager and Analyst
        Melbourne Institute of Applied Economic and Social Research
        The University of Melbourne
        Melbourne VIC 3010 Australia
        New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
        http://www.melbourneinstitute.com/hilda/


Reply | Threaded
Open this post in threaded view
|

Re: Restructure dataset with 3 demo vars into single var+ little tricky part

Maguin, Eugene
Brandon,

Much better explanation but not yet complete. Please verify that 1) you want
to aggregate by ethgroup and sum across these variables: SUCC,TOT,RCVD,COMP.
I think this is what you are indicating but that little detail is left out.
And 2) create new records with values of 0 for SUCC,TOT,RCVD,COMP for
ethgroup categories not represented in the original datafile, and 3) then
create a new variable, which could be

Alldemo=10+ethgroup, or
Alldemo=concat('1',string(ethgroup(f1.0)).

Depending on whether you prefer alldemo to be numeric or string.

Is this all correct?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Restructure dataset with 3 demo vars into single var

Richard Ristow
In reply to this post by Brandon Arnieri
At 11:17 PM 8/17/2006, Brandon Arnieri wrote, following up:

>ALLDEMO is a string concatanation, i j where i represents the i-th
>demo var at its j-value:  (i used numbers to preserve the order)
>so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black),
>and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian),
>
>....
>ALLDEMO =22 denotes demo var2 (gender) and its second level (male),
>...
>ALLDEMO =35 denotes demo var3 ([agegrp]) and its fifth level
>(unknown).

So, like this? I'm making ALLDEMO a 3-character string, with a hyphen
between category and value:

VALUE LABELS ALLDEMO
     '1-1' 'Eth-Black'   '1-2' 'Eth-Asian'   '1-3' 'Eth-White'
     '1-4' 'Eth-Filpno'  '1-5' 'Eth-Hispan'  '1-6' 'Eth-Other'
     '1-7' 'Eth-NS'
     '2-1' 'Sex-Female'  '2-2' 'Sex-Male'    '2-3' 'Sex-UNKN'
     '3-1' 'Age <25'     '3-2' 'Age  25-35'  '3-3' 'Age  36-50'
     '3-4' 'Age >50'     '3-5' 'Age UNKN'.

So, a black male is in BOTH class 11 (ethnic=black) AND class 22
(gender=male), right?

>sorry if i didn't make this clearer in my earlier post.

Thanks for improving it. We wouldn't have had a CLUE from what you
posted before. Not a CLUE. If you'd even said what, say, ALLDEMO=35
meant, we'd maybe have had a chance.

Anyway, I think this does it. This is SPSS draft output. Listing (but
not the logic) uses function VALUELABEL, which was introduced in SPSS
14.

*  ................................................ .
GET FILE=DATAST_A.
LIST.
|-----------------------------|---------------------------|
|Output Created               |19-AUG-2006 01:06:48       |
|-----------------------------|---------------------------|
ETHGRP SEX AGEGRP SUCC TOT RCVD COMP

    1     1    1      1   4    2    1
    1     1    2      3  12    7    3
    1     1    4      1   1    1    1
    1     2    1      0   4    0    0
    1     2    2      2   2    2    2
    1     2    3      0   1    0    0
    4     2    1      1   7    2    2
    5     1    2      3   6    4    4
    5     1    4      4   6    4    4
    6     1    1      0   7    0    0

Number of cases read:  10    Number of cases listed:  10


*  I.   Write each input record as THREE records,   .
*  one for its category on each of the three        .
*  demographic variables.                           .

STRING ALLDEMO (A3).
VALUE LABELS ALLDEMO
     '1-1' 'Eth-Black'   '1-2' 'Eth-Asian'   '1-3' 'Eth-White'
     '1-4' 'Eth-Filpno'  '1-5' 'Eth-Hispan'  '1-6' 'Eth-Other'
     '1-7' 'Eth-NS'
     '2-1' 'Sex-Female'  '2-2' 'Sex-Male'    '2-3' 'Sex-UNKN'
     '3-1' 'Age <25'     '3-2' 'Age  25-35'  '3-3' 'Age  36-50'
     '3-4' 'Age >50'     '3-5' 'Age UNKN'.

VECTOR DEMGVAR = ETHGRP TO AGEGRP.
LOOP #DEMCAT = 1 TO 3.
.  COMPUTE #VALUE = DEMGVAR(#DEMCAT).
.  COMPUTE ALLDEMO =
         CONCAT(STRING(#DEMCAT,F1),'-',
                STRING(#VALUE ,F1)).
.  XSAVE OUTFILE=UNROLL
        /KEEP=ALLDEMO SUCC TOT RCVD COMP.
END LOOP.
EXECUTE.

GET FILE=UNROLL.
TEMPORARY.
STRING  DemgName (A10).
COMPUTE DemgName = VALUELABEL(ALLDEMO).
LIST    ALLDEMO DemgName  SUCC TOT RCVD COMP.
|-----------------------------|---------------------------|
|Output Created               |19-AUG-2006 01:06:49       |
|-----------------------------|---------------------------|
ALLDEMO DemgName   SUCC TOT RCVD COMP

1-1     Eth-Black     1   4    2    1
2-1     Sex-Female    1   4    2    1
3-1     Age <25       1   4    2    1
1-1     Eth-Black     3  12    7    3
2-1     Sex-Female    3  12    7    3
3-2     Age  25-35    3  12    7    3
1-1     Eth-Black     1   1    1    1
2-1     Sex-Female    1   1    1    1
3-4     Age >50       1   1    1    1
1-1     Eth-Black     0   4    0    0
2-2     Sex-Male      0   4    0    0
3-1     Age <25       0   4    0    0
1-1     Eth-Black     2   2    2    2
2-2     Sex-Male      2   2    2    2
3-2     Age  25-35    2   2    2    2
1-1     Eth-Black     0   1    0    0
2-2     Sex-Male      0   1    0    0
3-3     Age  36-50    0   1    0    0
1-4     Eth-Filpno    1   7    2    2
2-2     Sex-Male      1   7    2    2
3-1     Age <25       1   7    2    2
1-5     Eth-Hispan    3   6    4    4
2-1     Sex-Female    3   6    4    4
3-2     Age  25-35    3   6    4    4
1-5     Eth-Hispan    4   6    4    4
2-1     Sex-Female    4   6    4    4
3-4     Age >50       4   6    4    4
1-6     Eth-Other     0   7    0    0
2-1     Sex-Female    0   7    0    0
3-1     Age <25       0   7    0    0

Number of cases read:  30    Number of cases listed:  30


*  II.  Create a blank record for each possible     .
*  demographic category.                            .
NEW FILE.
DATA LIST FREE /ALLDEMO  (A3) DemgName(A10).
BEGIN DATA
     '1-1' 'Eth-Black'   '1-2' 'Eth-Asian'   '1-3' 'Eth-White'
     '1-4' 'Eth-Filpno'  '1-5' 'Eth-Hispan'  '1-6' 'Eth-Other'
     '1-7' 'Eth-NS'
     '2-1' 'Sex-Female'  '2-2' 'Sex-Male'    '2-3' 'Sex-UNKN'
     '3-1' 'Age <25'     '3-2' 'Age  25-35'  '3-3' 'Age  36-50'
     '3-4' 'Age >50'     '3-5' 'Age UNKN'.
END DATA.

 >Warning # 522
 >An unexpected end of file has been found in the middle of reading a
case.
 >The partial case will be ignored.  Check your input for a possible
missing
 >record.

 >Command line: 477  Current case: 16  Current splitfile group: 1
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created               |19-AUG-2006 01:06:49       |
|-----------------------------|---------------------------|
ALLDEMO DemgName

1-1     Eth-Black
1-2     Eth-Asian
1-3     Eth-White
1-4     Eth-Filpno
1-5     Eth-Hispan
1-6     Eth-Other
1-7     Eth-NS
2-1     Sex-Female
2-2     Sex-Male
2-3     Sex-UNKN
3-1     Age <25
3-2     Age  25-35
3-3     Age  36-50
3-4     Age >50
3-5     Age UNKN

Number of cases read:  15    Number of cases listed:  15


*  III. Combine blank and real records, aggregate   .

ADD FILES
    /FILE=*
    /FILE=UNROLL.
AGGREGATE OUTFILE=*
    /BREAK=ALLDEMO
    /    SUCC TOT RCVD COMP
    =SUM(SUCC TOT RCVD COMP).
FORMATS SUCC TOT RCVD COMP(F4).
RECODE  SUCC TOT RCVD COMP(SYSMIS = 0).

TEMPORARY.
STRING  DemgName (A10).
COMPUTE DemgName = VALUELABEL(ALLDEMO).
LIST    ALLDEMO DemgName  SUCC TOT RCVD COMP.
|-----------------------------|---------------------------|
|Output Created               |19-AUG-2006 01:06:49       |
|-----------------------------|---------------------------|
ALLDEMO DemgName   SUCC  TOT RCVD COMP

1-1     Eth-Black     7   24   12    7
1-2     Eth-Asian     0    0    0    0
1-3     Eth-White     0    0    0    0
1-4     Eth-Filpno    1    7    2    2
1-5     Eth-Hispan    7   12    8    8
1-6     Eth-Other     0    7    0    0
1-7     Eth-NS        0    0    0    0
2-1     Sex-Female   12   36   18   13
2-2     Sex-Male      3   14    4    4
2-3     Sex-UNKN      0    0    0    0
3-1     Age <25       2   22    4    3
3-2     Age  25-35    8   20   13    9
3-3     Age  36-50    0    1    0    0
3-4     Age >50       5    7    5    5
3-5     Age UNKN      0    0    0    0

Number of cases read:  15    Number of cases listed:  15