Hello Folks,
I hope I have presented my problem sufficiently and I do apologize if its not so clear. I have perused www.spsstools.net but not exactly sure which version applies to my situation. Also, I am not sure how to ensure the presence of levels of the demo vars in the case they did not exist in particular Dataset A. For example, ETHGRP is reported having 7 levels in Dataset B but only three exist in the this particular example. So, I presume I would need to impute those levels nonexistent in Dataset A as cases in restructured Dataset B. Your time and assistance is greatly appreciated. Have a great day, Brandon <>-barnxieriaxtgmaildxotcom-<> remove x's in email And, the problem: I am attempting to restructure Dataset A below (var names in first line) containing three demographic vars and four numeric counts of enrollments. Its shape is the following: VALUE LABELS ETHGRP '1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4' 'Filipino' '5' 'Hispanic' '6' 'Other' '7' 'Declined to State'. VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'. VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5 'Unknown'. ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP 1,1,1,1,4,2,1 1,1,2,3,12,7,3 1,1,4,1,1,1,1 1,2,1,0,4,0,0 1,2,2,2,2,2,2 1,2,3,0,1,0,0 4,2,1,1,7,2,2 5,1,2,3,6,4,4 5,1,4,4,6,4,4 6,1,1,0,7,0,0 I would like obtain Dataset B below (var names in first line) by collapsing the three demographic vars into a single var along with the new corresponding enrollment counts. However, its important that I preserve all levels of each demo var even if it has not existed in Dataset A. The following is the restructured dataset I hope to arrive at: ALLDEMO,SUCC,TOT,RCVD,COMP 11,7,24,12,7 12,0,0,0,0 13,0,0,0,0 14,1,7,2,2 15,7,12,8,8 16,0,7,0,0 17,0,0,0,0 21,12,36,18,13 22,3,14,4,4 23,0,0,0,0 31,2,22,4,3 32,8,20,13,9 33,0,1,0,0 34,5,7,5,5 35,0,0,0,0 |
At 02:10 PM 8/17/2006, Brandon Arnieri wrote:
>I hope I have presented my problem sufficiently and I do apologize if >its not so clear. Maybe somebody else has got this, in which case I apologize for being thick, but I don't understand it. Could you explain more? You have >ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP >1,1,1,1,4,2,1 >1,1,2,3,12,7,3 >1,1,4,1,1,1,1 >1,2,1,0,4,0,0 >1,2,2,2,2,2,2 >1,2,3,0,1,0,0 >4,2,1,1,7,2,2 >5,1,2,3,6,4,4 >5,1,4,4,6,4,4 >6,1,1,0,7,0,0 and you want >ALLDEMO,SUCC,TOT,RCVD,COMP >11,7,24,12,7 >12,0,0,0,0 >13,0,0,0,0 >14,1,7,2,2 >15,7,12,8,8 >16,0,7,0,0 >17,0,0,0,0 >21,12,36,18,13 >22,3,14,4,4 >23,0,0,0,0 >31,2,22,4,3 >32,8,20,13,9 >33,0,1,0,0 >34,5,7,5,5 >35,0,0,0,0 >collapsing the three demographic vars into a single var along with the >new corresponding enrollment counts Skipping for the moment "collapsing the three demographic vars into a single var", what do you mean by "the new corresponding enrollment counts"? For example, how are output rows 11,7,24,12,7 22,3,14,4,4 calculated from the input? Where there do seem to be correspondences, if 14,1,7,2,2 comes from 4,2,1,1,7,2,2 how is 14 calculated from 4,2,1? (OK, I'll probably figure it out as soon as I send this.) Thanks, and good luck, Richard |
In reply to this post by Brandon Arnieri
ALLDEMO is a string concatanation, i j where i represents the i-th demo var at its j-value: (i used numbers to preserve the order)
so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black), and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian), .... ALLDEMO =22 denotes demo var2 (gender) and its second level (male), ... ALLDEMO =35 denotes demo var3 (gender) and its fifth level (unknown). this probably isn't the best approach...however, this end result is just the most convenient for my reporting module. sorry if i didn't make this clearer in my earlier post. -----Original Message----- From: Simon Freidin [mailto:[hidden email]] Sent: Thu 8/17/06 4:55 PM To: Brandon Arnieri Cc: Subject: Re: Restructure dataset with 3 demo vars into single var+ little tricky part explain how you create ALLDEMO from ETHGRP,SEX,AGEGRP (to the list!) or you won't get any answers cheers Simon At 04:10 AM 18/08/2006, you wrote: >Hello Folks, > >I hope I have presented my problem sufficiently and I do apologize if >its not so clear. I have perused www.spsstools.net but not exactly sure >which version applies to my situation. Also, I am not sure how to >ensure the presence of levels of the demo vars in the case they did not >exist in particular Dataset A. For example, ETHGRP is reported having 7 >levels in Dataset B but only three exist in the this particular example. >So, I presume I would need to impute those levels nonexistent in Dataset >A as cases in restructured Dataset B. > >Your time and assistance is greatly appreciated. > >Have a great day, > >Brandon ><>-barnxieriaxtgmaildxotcom-<> >remove x's in email > >And, the problem: > >I am attempting to restructure Dataset A below (var names in first line) >containing three demographic vars and four numeric counts of >enrollments. Its shape is the following: > >VALUE LABELS ETHGRP >'1' 'Black' '2' 'Asian,Pacific Islander' '3' 'White, Non-Hispanic' '4' >'Filipino' >'5' 'Hispanic' '6' 'Other' '7' 'Declined to State'. >VALUE LABELS SEX 1 'Female' 2 'Male' 3 'Unknown'. >VALUE LABELS AGEGRP 1 'Under 25' 2 '25 to 35' 3 '36 to 50' 4 'Over 50' 5 >'Unknown'. > >ETHGRP,SEX,AGEGRP,SUCC,TOT,RCVD,COMP >1,1,1,1,4,2,1 >1,1,2,3,12,7,3 >1,1,4,1,1,1,1 >1,2,1,0,4,0,0 >1,2,2,2,2,2,2 >1,2,3,0,1,0,0 >4,2,1,1,7,2,2 >5,1,2,3,6,4,4 >5,1,4,4,6,4,4 >6,1,1,0,7,0,0 > >I would like obtain Dataset B below (var names in first line) by >collapsing the three demographic vars into a single var along with the >new corresponding enrollment counts. However, its important that I >preserve all levels of each demo var even if it has not existed in >Dataset A. The following is the restructured dataset I hope to arrive >at: > >ALLDEMO,SUCC,TOT,RCVD,COMP >11,7,24,12,7 >12,0,0,0,0 >13,0,0,0,0 >14,1,7,2,2 >15,7,12,8,8 >16,0,7,0,0 >17,0,0,0,0 >21,12,36,18,13 >22,3,14,4,4 >23,0,0,0,0 >31,2,22,4,3 >32,8,20,13,9 >33,0,1,0,0 >34,5,7,5,5 >35,0,0,0,0 Research Database Manager and Analyst Melbourne Institute of Applied Economic and Social Research The University of Melbourne Melbourne VIC 3010 Australia New Tel: (03) 8344 2085 New Fax: (03) 8344 2111 http://www.melbourneinstitute.com/hilda/ |
Brandon,
Much better explanation but not yet complete. Please verify that 1) you want to aggregate by ethgroup and sum across these variables: SUCC,TOT,RCVD,COMP. I think this is what you are indicating but that little detail is left out. And 2) create new records with values of 0 for SUCC,TOT,RCVD,COMP for ethgroup categories not represented in the original datafile, and 3) then create a new variable, which could be Alldemo=10+ethgroup, or Alldemo=concat('1',string(ethgroup(f1.0)). Depending on whether you prefer alldemo to be numeric or string. Is this all correct? Gene Maguin |
In reply to this post by Brandon Arnieri
At 11:17 PM 8/17/2006, Brandon Arnieri wrote, following up:
>ALLDEMO is a string concatanation, i j where i represents the i-th >demo var at its j-value: (i used numbers to preserve the order) >so ALLDEMO = 11 denotes demo var1 (ethnic) and first level (black), >and ALLDEMO =12 denotes demo var1 (ethnic) and second level (asian), > >.... >ALLDEMO =22 denotes demo var2 (gender) and its second level (male), >... >ALLDEMO =35 denotes demo var3 ([agegrp]) and its fifth level >(unknown). So, like this? I'm making ALLDEMO a 3-character string, with a hyphen between category and value: VALUE LABELS ALLDEMO '1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White' '1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other' '1-7' 'Eth-NS' '2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN' '3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50' '3-4' 'Age >50' '3-5' 'Age UNKN'. So, a black male is in BOTH class 11 (ethnic=black) AND class 22 (gender=male), right? >sorry if i didn't make this clearer in my earlier post. Thanks for improving it. We wouldn't have had a CLUE from what you posted before. Not a CLUE. If you'd even said what, say, ALLDEMO=35 meant, we'd maybe have had a chance. Anyway, I think this does it. This is SPSS draft output. Listing (but not the logic) uses function VALUELABEL, which was introduced in SPSS 14. * ................................................ . GET FILE=DATAST_A. LIST. |-----------------------------|---------------------------| |Output Created |19-AUG-2006 01:06:48 | |-----------------------------|---------------------------| ETHGRP SEX AGEGRP SUCC TOT RCVD COMP 1 1 1 1 4 2 1 1 1 2 3 12 7 3 1 1 4 1 1 1 1 1 2 1 0 4 0 0 1 2 2 2 2 2 2 1 2 3 0 1 0 0 4 2 1 1 7 2 2 5 1 2 3 6 4 4 5 1 4 4 6 4 4 6 1 1 0 7 0 0 Number of cases read: 10 Number of cases listed: 10 * I. Write each input record as THREE records, . * one for its category on each of the three . * demographic variables. . STRING ALLDEMO (A3). VALUE LABELS ALLDEMO '1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White' '1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other' '1-7' 'Eth-NS' '2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN' '3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50' '3-4' 'Age >50' '3-5' 'Age UNKN'. VECTOR DEMGVAR = ETHGRP TO AGEGRP. LOOP #DEMCAT = 1 TO 3. . COMPUTE #VALUE = DEMGVAR(#DEMCAT). . COMPUTE ALLDEMO = CONCAT(STRING(#DEMCAT,F1),'-', STRING(#VALUE ,F1)). . XSAVE OUTFILE=UNROLL /KEEP=ALLDEMO SUCC TOT RCVD COMP. END LOOP. EXECUTE. GET FILE=UNROLL. TEMPORARY. STRING DemgName (A10). COMPUTE DemgName = VALUELABEL(ALLDEMO). LIST ALLDEMO DemgName SUCC TOT RCVD COMP. |-----------------------------|---------------------------| |Output Created |19-AUG-2006 01:06:49 | |-----------------------------|---------------------------| ALLDEMO DemgName SUCC TOT RCVD COMP 1-1 Eth-Black 1 4 2 1 2-1 Sex-Female 1 4 2 1 3-1 Age <25 1 4 2 1 1-1 Eth-Black 3 12 7 3 2-1 Sex-Female 3 12 7 3 3-2 Age 25-35 3 12 7 3 1-1 Eth-Black 1 1 1 1 2-1 Sex-Female 1 1 1 1 3-4 Age >50 1 1 1 1 1-1 Eth-Black 0 4 0 0 2-2 Sex-Male 0 4 0 0 3-1 Age <25 0 4 0 0 1-1 Eth-Black 2 2 2 2 2-2 Sex-Male 2 2 2 2 3-2 Age 25-35 2 2 2 2 1-1 Eth-Black 0 1 0 0 2-2 Sex-Male 0 1 0 0 3-3 Age 36-50 0 1 0 0 1-4 Eth-Filpno 1 7 2 2 2-2 Sex-Male 1 7 2 2 3-1 Age <25 1 7 2 2 1-5 Eth-Hispan 3 6 4 4 2-1 Sex-Female 3 6 4 4 3-2 Age 25-35 3 6 4 4 1-5 Eth-Hispan 4 6 4 4 2-1 Sex-Female 4 6 4 4 3-4 Age >50 4 6 4 4 1-6 Eth-Other 0 7 0 0 2-1 Sex-Female 0 7 0 0 3-1 Age <25 0 7 0 0 Number of cases read: 30 Number of cases listed: 30 * II. Create a blank record for each possible . * demographic category. . NEW FILE. DATA LIST FREE /ALLDEMO (A3) DemgName(A10). BEGIN DATA '1-1' 'Eth-Black' '1-2' 'Eth-Asian' '1-3' 'Eth-White' '1-4' 'Eth-Filpno' '1-5' 'Eth-Hispan' '1-6' 'Eth-Other' '1-7' 'Eth-NS' '2-1' 'Sex-Female' '2-2' 'Sex-Male' '2-3' 'Sex-UNKN' '3-1' 'Age <25' '3-2' 'Age 25-35' '3-3' 'Age 36-50' '3-4' 'Age >50' '3-5' 'Age UNKN'. END DATA. >Warning # 522 >An unexpected end of file has been found in the middle of reading a case. >The partial case will be ignored. Check your input for a possible missing >record. >Command line: 477 Current case: 16 Current splitfile group: 1 . /**/ LIST. |-----------------------------|---------------------------| |Output Created |19-AUG-2006 01:06:49 | |-----------------------------|---------------------------| ALLDEMO DemgName 1-1 Eth-Black 1-2 Eth-Asian 1-3 Eth-White 1-4 Eth-Filpno 1-5 Eth-Hispan 1-6 Eth-Other 1-7 Eth-NS 2-1 Sex-Female 2-2 Sex-Male 2-3 Sex-UNKN 3-1 Age <25 3-2 Age 25-35 3-3 Age 36-50 3-4 Age >50 3-5 Age UNKN Number of cases read: 15 Number of cases listed: 15 * III. Combine blank and real records, aggregate . ADD FILES /FILE=* /FILE=UNROLL. AGGREGATE OUTFILE=* /BREAK=ALLDEMO / SUCC TOT RCVD COMP =SUM(SUCC TOT RCVD COMP). FORMATS SUCC TOT RCVD COMP(F4). RECODE SUCC TOT RCVD COMP(SYSMIS = 0). TEMPORARY. STRING DemgName (A10). COMPUTE DemgName = VALUELABEL(ALLDEMO). LIST ALLDEMO DemgName SUCC TOT RCVD COMP. |-----------------------------|---------------------------| |Output Created |19-AUG-2006 01:06:49 | |-----------------------------|---------------------------| ALLDEMO DemgName SUCC TOT RCVD COMP 1-1 Eth-Black 7 24 12 7 1-2 Eth-Asian 0 0 0 0 1-3 Eth-White 0 0 0 0 1-4 Eth-Filpno 1 7 2 2 1-5 Eth-Hispan 7 12 8 8 1-6 Eth-Other 0 7 0 0 1-7 Eth-NS 0 0 0 0 2-1 Sex-Female 12 36 18 13 2-2 Sex-Male 3 14 4 4 2-3 Sex-UNKN 0 0 0 0 3-1 Age <25 2 22 4 3 3-2 Age 25-35 8 20 13 9 3-3 Age 36-50 0 1 0 0 3-4 Age >50 5 7 5 5 3-5 Age UNKN 0 0 0 0 Number of cases read: 15 Number of cases listed: 15 |
Free forum by Nabble | Edit this page |