|
Hi Listers,
I'm in v 13. I have 2 datasets, each with 144 segments that are created by a statement such as "If v1=1 & v2=3 & v3=2" then this is segment 1, etc. There are 144 unique combinations based on 6 levels in v1, 6 in v2, and 4 in v3. Within each of these segments will be some number of cases. For these select cases in each segment, I would like to sum the values in a numeric variable. I would repeat this task for all 144 segments in each of the 2 datasets. Here's the next step: take the summed values for each segment and compare them across the 2 datasets to see by what factor they are different. Each segment would have a unique factor. Final step: apply this factor to the segments in one dataset to "adjust" them. Any thoughts on 1. automating the segmenting and summing task, and/or 2. on creating and applying the factor across the datasets would be greatly appreciated. Thanks! Nico -- Nico Peruzzi, Ph.D. |
|
Nico,
As I understand it, this is basically an aggregate and a match files problem. I'm assuming you have already created the segment variable. In each dataset aggregate with segment number as a break variable and sum values for the variable of interest. Save the results for each file in a NEW file. Then match the two new datasets together using segment number as the By variable. Aggregate outfile=*\break=segment\sumx=sum(x). From this point on, what you want to do isn't well enough defined for me to be helpful. You say "... Compare them across the 2 datasets to see by what factor they are different. Each segment would have a unique factor." Do you mean the difference between correpsonding segments? The ratio? Either way, I'm guessing that you'd need to do a match files operation with segment as the By variable. You say "Final step: apply this factor to the segments in one dataset to 'adjust' them." Is this done to the non-aggregated data? Aggregated data? Gene Maguin |
|
Gene,
Thanks for the reply. To begin, I have not yet created the segment or sum variable. I could use help with how to automate the recoding process to create the segment variable, so that I don't hand code 144 recode statements. Once I have the segment variable, I would like to automate the task of select if segment=1, followed by case summaries, sum the numeric variable. Again, I know how to do it by hand 144 times, but I figure there has to be a better way. Thanks for the aggregate tip - that makes sense to give me a new dataset with variables: segment, sum1, and sum2. At this point, I would calculate a ratio between the 2 sum variables. I would then need to apply the ratio attached to each segment to one of the original datasets (unaggregated data). Imagine that the original data has 5 cases per each of the 144 segments, and it has a numeric variable with a value for each case. I would then want to apply the ratio for each segment to each of the cases that falls within that segment to "adjust" its value. It sounds like you are suggesting using match files to line up the ratio for each segment next to the values for each case. If I get this far, I can use compute to get my final value. I'm sure the match cases syntax is simple, but I'm not that familiar with it - could you please explain? Thanks, nico On 8/28/07, Gene Maguin <[hidden email]> wrote: > > Nico, > > As I understand it, this is basically an aggregate and a match files > problem. I'm assuming you have already created the segment variable. In > each > dataset aggregate with segment number as a break variable and sum values > for > the variable of interest. Save the results for each file in a NEW file. > Then > match the two new datasets together using segment number as the By > variable. > > > Aggregate outfile=*\break=segment\sumx=sum(x). > > > From this point on, what you want to do isn't well enough defined for me > to > be helpful. > > You say "... Compare them across the 2 datasets to see by what factor they > are different. Each segment would have a unique factor." > > Do you mean the difference between correpsonding segments? The ratio? > Either > way, I'm guessing that you'd need to do a match files operation with > segment > as the By variable. > > You say "Final step: apply this factor to the segments in one dataset to > 'adjust' > them." > > Is this done to the non-aggregated data? Aggregated data? > > Gene Maguin > -- Nico Peruzzi, Ph.D. |
|
Nico,
>>To begin, I have not yet created the segment or sum variable. I could use help with how to automate the recoding process to create the segment variable, so that I don't hand code 144 recode statements. This may not work but you said "There are 144 unique combinations based on 6 levels in v1, 6 in v2, and 4 in v3." If v1 and v2 have values of 1 thru 6 and v3 has values of 1 thru 4, you could compute a segment variable in the following manner. Compute segment=6*4*(v1-1)+4*(v2-1)+v3. If v1=4, v2=6, and v3=2 for one case, segment=24*3+4*5+2=94. If v1, v2 or v3 do not have inclusive values, then maybe you could recode them to create new variables that do. >>Once I have the segment variable, I would like to automate the task of select if segment=1, followed by case summaries, sum the numeric variable. Again, I know how to do it by hand 144 times, but I figure there has to be a better way. I'm not sure what you mean by case summaries. Assuming you mean something that can be done by frequencies or descriptives, the thing to so is Sort cases by segment. split files by segment. Descriptives <var list>. >>At this point, I would calculate a ratio between the 2 sum variables. I apologize for an error in my earlier posting. When you aggregate the two files, make sure to give the summed variables different names so that the following match files doesn't write one on top of the other. For instance, in one file the variable could be sumxf1 and in the other file, sumxf2. Given were you're going, when you match the two files together, the only variables you may need to keep are segment, sumxf1 and sumxf2. So let Ratio be sumxf1/sumxf2. Read up on the match files command in the syntax ref. But here: Match files file=file1/file=file2/by segment/keep=segment sumxf1 sumxf2. Make your computation and save the file, call it RatioFile, you just put together. All you need to keep are segment and ratio. >>I would then need to apply the ratio attached to each segment to one of the original datasets (unaggregated data). Imagine that the original data has 5 cases per each of the 144 segments, and it has a numeric variable with a value for each case. I would then want to apply the ratio for each segment to each of the cases that falls within that segment to "adjust" its value. Next, match RatioFile with your original file 1 by using the table subcommand. As in, Match files file=file1/table=ratiofile/by segment. Then you can do the computations that follow. Gene Maguin It sounds like you are suggesting using match files to line up the ratio for each segment next to the values for each case. If I get this far, I can use compute to get my final value. I'm sure the match cases syntax is simple, but I'm not that familiar with it - could you please explain? Thanks, nico |
|
At 05:54 PM 8/28/2007, Gene Maguin wrote:
>You said "There are 144 unique combinations based on 6 levels in v1, 6 >in v2, and 4 in v3." If v1 and v2 have values of 1 thru 6 and v3 has >values of 1 thru 4, you could compute a segment variable in the >following manner.: > >Compute segment=6*4*(v1-1)+4*(v2-1)+v3. > >If v1=4, v2=6, and v3=2 for one case, segment=24*3+4*5+2=94. > >If v1, v2 or v3 do not have inclusive values, then maybe you could >recode >them to create new variables that do. That will definitely do it, though the resulting variable won't be easy to read. I would consider, a.) Multiplying by 100 and 10, instead of 24 and 4, to put the v-values in separate digits, and more readable. That would also catch at least some cases where v-values are out of range. b.) Using a string for the segment variable, to use separators and punctuation for readability. c.) Not creating a segment variable at all. SPLIT FILE, AGGREGATE, and MATCH FILES are perfectly happy with multiple variable keys, like the triplet v1, v2, v3. Anyway, to illustrate a.) and b.), here's SPSS 15 draft output (WRR: not saved separately): |-----------------------------|---------------------------| |Output Created |28-AUG-2007 19:15:01 | |-----------------------------|---------------------------| CaseNum v1 v2 v3 01 2 3 1 02 2 3 4 03 4 3 1 04 4 3 4 05 6 3 1 06 6 3 4 Number of cases read: 6 Number of cases listed: 6 NUMERIC segment (F04) /segment2 (F05). STRING segmentC (A07) /segmentD (A17). Compute segment = 6*4*(v1-1)+ 4*(v2-1)+ v3. Compute segment2 = 1E2*v1 + 1E1*v2 + v3. Compute segmentC = CONCAT(LTRIM(STRING(v1,F2)) ,'.'). Compute segmentC = CONCAT(RTRIM(segmentC), LTRIM(STRING(v2,F2)) ,'.'). Compute segmentC = CONCAT(RTRIM(segmentC), LTRIM(STRING (v3,F2))). Compute segmentD = CONCAT('v1=',LTRIM(STRING(v1,F2)) ,';'). Compute segmentD = CONCAT(RTRIM(segmentD), 'v2=',LTRIM(STRING(v2,F2)) ,';'). Compute segmentD = CONCAT(RTRIM(segmentD), 'v3=',LTRIM(STRING (v3,F2))). LIST. List |-----------------------------|---------------------------| |Output Created |28-AUG-2007 19:15:01 | |-----------------------------|---------------------------| CaseNum v1 v2 v3 segment segment2 segmentC segmentD 01 2 3 1 33 231 2.3.1 v1=2;v2=3;v3=1 02 2 3 4 36 234 2.3.4 v1=2;v2=3;v3=4 03 4 3 1 81 431 4.3.1 v1=4;v2=3;v3=1 04 4 3 4 84 434 4.3.4 v1=4;v2=3;v3=4 05 6 3 1 129 631 6.3.1 v1=6;v2=3;v3=1 06 6 3 4 132 634 6.3.4 v1=6;v2=3;v3=4 Number of cases read: 6 Number of cases listed: 6 =================== APPENDIX: Test data =================== INPUT PROGRAM. . NUMERIC CaseNum (N2). . LEAVE CaseNum. . NUMERIC v1 v2 v3 (F2). . LEAVE v1 v2 v3. . COMPUTE CaseNum = 0. . LOOP v1 = 2 TO 6 BY 2. . LOOP v2 = 3 TO 5 BY 4. . LOOP v3 = 1 TO 4 BY 3. . COMPUTE CaseNum = CaseNum + 1. . END CASE. . END LOOP. . END LOOP. . END LOOP. END FILE. END INPUT PROGRAM. LIST. |
| Free forum by Nabble | Edit this page |
