SPSSX Discussion

Restructure dataset with 3 demo vars into single var+ little tricky part

Classic

List

Threaded

5 messages Options

Brandon Arnieri

Restructure dataset with 3 demo vars into single var+ little tricky part

Hello Folks,

I hope I have presented my problem sufficiently and I do apologize if
its not so clear. I have perused www.spsstools.net but not exactly sure
which version applies to my situation. Also, I am not sure how to
ensure the presence of levels of the demo vars in the case they did not
exist in particular Dataset A. For example, ETHGRP is reported having 7
levels in Dataset B but only three exist in the this particular example.
So, I presume I would need to impute those levels nonexistent in Dataset
A as cases in restructured Dataset B.

Your time and assistance is greatly appreciated.

Have a great day,

Brandon
<>-barnxieriaxtgmaildxotcom-<>
remove x's in email

And, the problem:

I am attempting to restructure Dataset A below (var names in first line)
containing three demographic vars and four numeric counts of
enrollments. Its shape is the following:

VALUE LABELS ETHGRP
'1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4'
'Filipino'
'5' 'Hispanic' '6' 'Other' '7' 'Declined to State'.
VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'.
VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5
'Unknown'.

ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
1,1,1,1,4,2,1
1,1,2,3,12,7,3
1,1,4,1,1,1,1
1,2,1,0,4,0,0
1,2,2,2,2,2,2
1,2,3,0,1,0,0
4,2,1,1,7,2,2
5,1,2,3,6,4,4
5,1,4,4,6,4,4
6,1,1,0,7,0,0

I would like obtain Dataset B below (var names in first line) by
collapsing the three demographic vars into a single var along with the
new corresponding enrollment counts. However, its important that I
preserve all levels of each demo var even if it has not existed in
Dataset A. The following is the restructured dataset I hope to arrive
at:

ALLDEMO,SUCC,TOT,RCVD,COMP
11,7,24,12,7
12,0,0,0,0
13,0,0,0,0
14,1,7,2,2
15,7,12,8,8
16,0,7,0,0
17,0,0,0,0
21,12,36,18,13
22,3,14,4,4
23,0,0,0,0
31,2,22,4,3
32,8,20,13,9
33,0,1,0,0
34,5,7,5,5
35,0,0,0,0

Richard Ristow

Re: Restructure dataset with 3 demo vars into single var

At 02:10 PM 8/17/2006, Brandon Arnieri wrote:

>I hope I have presented my problem sufficiently and I do apologize if
>its not so clear.

Maybe somebody else has got this, in which case I apologize for being
thick, but I don't understand it. Could you explain more? You have

>ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
>1,1,1,1,4,2,1
>1,1,2,3,12,7,3
>1,1,4,1,1,1,1
>1,2,1,0,4,0,0
>1,2,2,2,2,2,2
>1,2,3,0,1,0,0
>4,2,1,1,7,2,2
>5,1,2,3,6,4,4
>5,1,4,4,6,4,4
>6,1,1,0,7,0,0

and you want

>ALLDEMO,SUCC,TOT,RCVD,COMP
>11,7,24,12,7
>12,0,0,0,0
>13,0,0,0,0
>14,1,7,2,2
>15,7,12,8,8
>16,0,7,0,0
>17,0,0,0,0
>21,12,36,18,13
>22,3,14,4,4
>23,0,0,0,0
>31,2,22,4,3
>32,8,20,13,9
>33,0,1,0,0
>34,5,7,5,5
>35,0,0,0,0

by
>collapsing the three demographic vars into a single var along with the
>new corresponding enrollment counts

Skipping for the moment "collapsing the three demographic vars into a
single var", what do you mean by "the new corresponding enrollment
counts"? For example, how are output rows
11,7,24,12,7
22,3,14,4,4
calculated from the input?

Where there do seem to be correspondences, if
14,1,7,2,2
comes from
4,2,1,1,7,2,2
how is 14 calculated from 4,2,1?

(OK, I'll probably figure it out as soon as I send this.)

Thanks, and good luck,
Richard

Brandon Arnieri

Re: Restructure dataset with 3 demo vars into single var+ little tricky part

In reply to this post by Brandon Arnieri

ALLDEMO is a string concatanation, i j where i represents the i-th demo var at its j-value: (i used numbers to preserve the order)
so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black),
and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian),
....
ALLDEMO =22 denotes demo var2 (gender) and its second level (male),
...
ALLDEMO =35 denotes demo var3 (gender) and its fifth level (unknown).

this probably isn't the best approach...however, this end result is just the most convenient for my reporting module. sorry if i didn't make this clearer in my earlier post.

-----Original Message-----
From: Simon Freidin [mailto:[hidden email]]
Sent: Thu 8/17/06 4:55 PM
To: Brandon Arnieri
Cc:
Subject: Re: Restructure dataset with 3 demo vars into single var+ little tricky part

explain how you create ALLDEMO from ETHGRP,SEX,AGEGRP (to the list!) or
you won't get any answers

cheers
Simon

At 04:10 AM 18/08/2006, you wrote:
>Hello Folks,
>
>I hope I have presented my problem sufficiently and I do apologize if
>its not so clear. I have perused www.spsstools.net but not exactly sure
>which version applies to my situation. Also, I am not sure how to
>ensure the presence of levels of the demo vars in the case they did not
>exist in particular Dataset A. For example, ETHGRP is reported having 7
>levels in Dataset B but only three exist in the this particular example.
>So, I presume I would need to impute those levels nonexistent in Dataset
>A as cases in restructured Dataset B.
>
>Your time and assistance is greatly appreciated.
>
>Have a great day,
>
>Brandon
><>-barnxieriaxtgmaildxotcom-<>
>remove x's in email
>
>And, the problem:
>
>I am attempting to restructure Dataset A below (var names in first line)
>containing three demographic vars and four numeric counts of
>enrollments. Its shape is the following:
>
>VALUE LABELS ETHGRP
>'1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4'
>'Filipino'
>'5' 'Hispanic' '6' 'Other' '7' 'Declined to State'.
>VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'.
>VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5
>'Unknown'.
>
>ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP
>1,1,1,1,4,2,1
>1,1,2,3,12,7,3
>1,1,4,1,1,1,1
>1,2,1,0,4,0,0
>1,2,2,2,2,2,2
>1,2,3,0,1,0,0
>4,2,1,1,7,2,2
>5,1,2,3,6,4,4
>5,1,4,4,6,4,4
>6,1,1,0,7,0,0
>
>I would like obtain Dataset B below (var names in first line) by
>collapsing the three demographic vars into a single var along with the
>new corresponding enrollment counts. However, its important that I
>preserve all levels of each demo var even if it has not existed in
>Dataset A. The following is the restructured dataset I hope to arrive
>at:
>
>ALLDEMO,SUCC,TOT,RCVD,COMP
>11,7,24,12,7
>12,0,0,0,0
>13,0,0,0,0
>14,1,7,2,2
>15,7,12,8,8
>16,0,7,0,0
>17,0,0,0,0
>21,12,36,18,13
>22,3,14,4,4
>23,0,0,0,0
>31,2,22,4,3
>32,8,20,13,9
>33,0,1,0,0
>34,5,7,5,5
>35,0,0,0,0

Research Database Manager and Analyst
Melbourne Institute of Applied Economic and Social Research
The University of Melbourne
Melbourne VIC 3010 Australia
New Tel: (03) 8344 2085 New Fax: (03) 8344 2111
http://www.melbourneinstitute.com/hilda/

Maguin, Eugene

Re: Restructure dataset with 3 demo vars into single var+ little tricky part

Brandon,

Much better explanation but not yet complete. Please verify that 1) you want
to aggregate by ethgroup and sum across these variables: SUCC,TOT,RCVD,COMP.
I think this is what you are indicating but that little detail is left out.
And 2) create new records with values of 0 for SUCC,TOT,RCVD,COMP for
ethgroup categories not represented in the original datafile, and 3) then
create a new variable, which could be

Alldemo=10+ethgroup, or
Alldemo=concat('1',string(ethgroup(f1.0)).

Depending on whether you prefer alldemo to be numeric or string.

Is this all correct?

Gene Maguin

Richard Ristow

Re: Restructure dataset with 3 demo vars into single var

In reply to this post by Brandon Arnieri

At 11:17 PM 8/17/2006, Brandon Arnieri wrote, following up:

>ALLDEMO is a string concatanation, i j where i represents the i-th
>demo var at its j-value: (i used numbers to preserve the order)
>so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black),
>and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian),
>
>....
>ALLDEMO =22 denotes demo var2 (gender) and its second level (male),
>...
>ALLDEMO =35 denotes demo var3 ([agegrp]) and its fifth level
>(unknown).

So, like this? I'm making ALLDEMO a 3-character string, with a hyphen
between category and value:

VALUE LABELS ALLDEMO
'1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White'
'1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other'
'1-7' 'Eth-NS'
'2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN'
'3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50'
'3-4' 'Age >50' '3-5' 'Age UNKN'.

So, a black male is in BOTH class 11 (ethnic=black) AND class 22
(gender=male), right?

>sorry if i didn't make this clearer in my earlier post.

Thanks for improving it. We wouldn't have had a CLUE from what you
posted before. Not a CLUE. If you'd even said what, say, ALLDEMO=35
meant, we'd maybe have had a chance.

Anyway, I think this does it. This is SPSS draft output. Listing (but
not the logic) uses function VALUELABEL, which was introduced in SPSS
14.

* ................................................ .
GET FILE=DATAST_A.
LIST.
|-----------------------------|---------------------------|
|Output Created |19-AUG-2006 01:06:48 |
|-----------------------------|---------------------------|
ETHGRP SEX AGEGRP SUCC TOT RCVD COMP

1 1 1 1 4 2 1
1 1 2 3 12 7 3
1 1 4 1 1 1 1
1 2 1 0 4 0 0
1 2 2 2 2 2 2
1 2 3 0 1 0 0
4 2 1 1 7 2 2
5 1 2 3 6 4 4
5 1 4 4 6 4 4
6 1 1 0 7 0 0

Number of cases read: 10 Number of cases listed: 10

* I. Write each input record as THREE records, .
* one for its category on each of the three .
* demographic variables. .

STRING ALLDEMO (A3).
VALUE LABELS ALLDEMO
'1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White'
'1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other'
'1-7' 'Eth-NS'
'2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN'
'3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50'
'3-4' 'Age >50' '3-5' 'Age UNKN'.

VECTOR DEMGVAR = ETHGRP TO AGEGRP.
LOOP #DEMCAT = 1 TO 3.
. COMPUTE #VALUE = DEMGVAR(#DEMCAT).
. COMPUTE ALLDEMO =
CONCAT(STRING(#DEMCAT,F1),'-',
STRING(#VALUE ,F1)).
. XSAVE OUTFILE=UNROLL
/KEEP=ALLDEMO SUCC TOT RCVD COMP.
END LOOP.
EXECUTE.

GET FILE=UNROLL.
TEMPORARY.
STRING DemgName (A10).
COMPUTE DemgName = VALUELABEL(ALLDEMO).
LIST ALLDEMO DemgName SUCC TOT RCVD COMP.
|-----------------------------|---------------------------|
|Output Created |19-AUG-2006 01:06:49 |
|-----------------------------|---------------------------|
ALLDEMO DemgName SUCC TOT RCVD COMP

1-1 Eth-Black 1 4 2 1
2-1 Sex-Female 1 4 2 1
3-1 Age <25 1 4 2 1
1-1 Eth-Black 3 12 7 3
2-1 Sex-Female 3 12 7 3
3-2 Age 25-35 3 12 7 3
1-1 Eth-Black 1 1 1 1
2-1 Sex-Female 1 1 1 1
3-4 Age >50 1 1 1 1
1-1 Eth-Black 0 4 0 0
2-2 Sex-Male 0 4 0 0
3-1 Age <25 0 4 0 0
1-1 Eth-Black 2 2 2 2
2-2 Sex-Male 2 2 2 2
3-2 Age 25-35 2 2 2 2
1-1 Eth-Black 0 1 0 0
2-2 Sex-Male 0 1 0 0
3-3 Age 36-50 0 1 0 0
1-4 Eth-Filpno 1 7 2 2
2-2 Sex-Male 1 7 2 2
3-1 Age <25 1 7 2 2
1-5 Eth-Hispan 3 6 4 4
2-1 Sex-Female 3 6 4 4
3-2 Age 25-35 3 6 4 4
1-5 Eth-Hispan 4 6 4 4
2-1 Sex-Female 4 6 4 4
3-4 Age >50 4 6 4 4
1-6 Eth-Other 0 7 0 0
2-1 Sex-Female 0 7 0 0
3-1 Age <25 0 7 0 0

Number of cases read: 30 Number of cases listed: 30

* II. Create a blank record for each possible .
* demographic category. .
NEW FILE.
DATA LIST FREE /ALLDEMO (A3) DemgName(A10).
BEGIN DATA
'1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White'
'1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other'
'1-7' 'Eth-NS'
'2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN'
'3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50'
'3-4' 'Age >50' '3-5' 'Age UNKN'.
END DATA.

>Warning # 522
>An unexpected end of file has been found in the middle of reading a
case.
>The partial case will be ignored. Check your input for a possible
missing
>record.

>Command line: 477 Current case: 16 Current splitfile group: 1
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created |19-AUG-2006 01:06:49 |
|-----------------------------|---------------------------|
ALLDEMO DemgName

1-1 Eth-Black
1-2 Eth-Asian
1-3 Eth-White
1-4 Eth-Filpno
1-5 Eth-Hispan
1-6 Eth-Other
1-7 Eth-NS
2-1 Sex-Female
2-2 Sex-Male
2-3 Sex-UNKN
3-1 Age <25
3-2 Age 25-35
3-3 Age 36-50
3-4 Age >50
3-5 Age UNKN

Number of cases read: 15 Number of cases listed: 15

* III. Combine blank and real records, aggregate .

ADD FILES
/FILE=*
/FILE=UNROLL.
AGGREGATE OUTFILE=*
/BREAK=ALLDEMO
/ SUCC TOT RCVD COMP
=SUM(SUCC TOT RCVD COMP).
FORMATS SUCC TOT RCVD COMP(F4).
RECODE SUCC TOT RCVD COMP(SYSMIS = 0).

TEMPORARY.
STRING DemgName (A10).
COMPUTE DemgName = VALUELABEL(ALLDEMO).
LIST ALLDEMO DemgName SUCC TOT RCVD COMP.
|-----------------------------|---------------------------|
|Output Created |19-AUG-2006 01:06:49 |
|-----------------------------|---------------------------|
ALLDEMO DemgName SUCC TOT RCVD COMP

1-1 Eth-Black 7 24 12 7
1-2 Eth-Asian 0 0 0 0
1-3 Eth-White 0 0 0 0
1-4 Eth-Filpno 1 7 2 2
1-5 Eth-Hispan 7 12 8 8
1-6 Eth-Other 0 7 0 0
1-7 Eth-NS 0 0 0 0
2-1 Sex-Female 12 36 18 13
2-2 Sex-Male 3 14 4 4
2-3 Sex-UNKN 0 0 0 0
3-1 Age <25 2 22 4 3
3-2 Age 25-35 8 20 13 9
3-3 Age 36-50 0 1 0 0
3-4 Age >50 5 7 5 5
3-5 Age UNKN 0 0 0 0

Number of cases read: 15 Number of cases listed: 15