SPSSX Discussion

automating a repetitive filter - sum task

Classic

List

Threaded

5 messages Options

Nico Peruzzi

automating a repetitive filter - sum task

Hi Listers,

I'm in v 13.

I have 2 datasets, each with 144 segments that are created by a statement
such as "If v1=1 & v2=3 & v3=2" then this is segment 1, etc. There are 144
unique combinations based on 6 levels in v1, 6 in v2, and 4 in v3.

Within each of these segments will be some number of cases. For these
select cases in each segment, I would like to sum the values in a numeric
variable.

I would repeat this task for all 144 segments in each of the 2 datasets.

Here's the next step: take the summed values for each segment and compare
them across the 2 datasets to see by what factor they are different. Each
segment would have a unique factor.

Final step: apply this factor to the segments in one dataset to "adjust"
them.

Any thoughts on 1. automating the segmenting and summing task, and/or 2. on
creating and applying the factor across the datasets would be greatly
appreciated.

Thanks! Nico

--
Nico Peruzzi, Ph.D.

Maguin, Eugene

Re: automating a repetitive filter - sum task

Nico,

As I understand it, this is basically an aggregate and a match files
problem. I'm assuming you have already created the segment variable. In each
dataset aggregate with segment number as a break variable and sum values for
the variable of interest. Save the results for each file in a NEW file. Then
match the two new datasets together using segment number as the By variable.

Aggregate outfile=*\break=segment\sumx=sum(x).

From this point on, what you want to do isn't well enough defined for me to
be helpful.

You say "... Compare them across the 2 datasets to see by what factor they
are different. Each segment would have a unique factor."

Do you mean the difference between correpsonding segments? The ratio? Either
way, I'm guessing that you'd need to do a match files operation with segment
as the By variable.

You say "Final step: apply this factor to the segments in one dataset to
'adjust'
them."

Is this done to the non-aggregated data? Aggregated data?

Gene Maguin

Nico Peruzzi

Re: automating a repetitive filter - sum task

Gene,

Thanks for the reply.

To begin, I have not yet created the segment or sum variable. I could use
help with how to automate the recoding process to create the segment
variable, so that I don't hand code 144 recode statements.

Once I have the segment variable, I would like to automate the task of
select if segment=1, followed by case summaries, sum the numeric variable.
Again, I know how to do it by hand 144 times, but I figure there has to be a
better way.

Thanks for the aggregate tip - that makes sense to give me a new dataset
with variables: segment, sum1, and sum2.

At this point, I would calculate a ratio between the 2 sum variables.

I would then need to apply the ratio attached to each segment to one of the
original datasets (unaggregated data). Imagine that the original data has 5
cases per each of the 144 segments, and it has a numeric variable with a
value for each case. I would then want to apply the ratio for each segment
to each of the cases that falls within that segment to "adjust" its value.

It sounds like you are suggesting using match files to line up the ratio for
each segment next to the values for each case. If I get this far, I can use
compute to get my final value. I'm sure the match cases syntax is simple,
but I'm not that familiar with it - could you please explain?

Thanks, nico

On 8/28/07, Gene Maguin <[hidden email]> wrote:

>
> Nico,
>
> As I understand it, this is basically an aggregate and a match files
> problem. I'm assuming you have already created the segment variable. In
> each
> dataset aggregate with segment number as a break variable and sum values
> for
> the variable of interest. Save the results for each file in a NEW file.
> Then
> match the two new datasets together using segment number as the By
> variable.
>
>
> Aggregate outfile=*\break=segment\sumx=sum(x).
>
>
> From this point on, what you want to do isn't well enough defined for me
> to
> be helpful.
>
> You say "... Compare them across the 2 datasets to see by what factor they
> are different. Each segment would have a unique factor."
>
> Do you mean the difference between correpsonding segments? The ratio?
> Either
> way, I'm guessing that you'd need to do a match files operation with
> segment
> as the By variable.
>
> You say "Final step: apply this factor to the segments in one dataset to
> 'adjust'
> them."
>
> Is this done to the non-aggregated data? Aggregated data?
>
> Gene Maguin
>

--
Nico Peruzzi, Ph.D.

Maguin, Eugene

Re: automating a repetitive filter - sum task

Nico,

>>To begin, I have not yet created the segment or sum variable. I could use
help with how to automate the recoding process to create the segment
variable, so that I don't hand code 144 recode statements.

This may not work but you said "There are 144 unique combinations based on 6
levels in v1, 6 in v2, and 4 in v3." If v1 and v2 have values of 1 thru 6
and v3 has values of 1 thru 4, you could compute a segment variable in the
following manner.

Compute segment=6*4*(v1-1)+4*(v2-1)+v3.

If v1=4, v2=6, and v3=2 for one case, segment=24*3+4*5+2=94.

If v1, v2 or v3 do not have inclusive values, then maybe you could recode
them to create new variables that do.

>>Once I have the segment variable, I would like to automate the task of
select if segment=1, followed by case summaries, sum the numeric variable.
Again, I know how to do it by hand 144 times, but I figure there has to be a
better way.

I'm not sure what you mean by case summaries. Assuming you mean something
that can be done by frequencies or descriptives, the thing to so is

Sort cases by segment.
split files by segment.
Descriptives <var list>.

>>At this point, I would calculate a ratio between the 2 sum variables.

I apologize for an error in my earlier posting. When you aggregate the two
files, make sure to give the summed variables different names so that the
following match files doesn't write one on top of the other. For instance,
in one file the variable could be sumxf1 and in the other file, sumxf2.

Given were you're going, when you match the two files together, the only
variables you may need to keep are segment, sumxf1 and sumxf2. So let Ratio
be sumxf1/sumxf2. Read up on the match files command in the syntax ref. But
here:

Match files file=file1/file=file2/by segment/keep=segment sumxf1 sumxf2.

Make your computation and save the file, call it RatioFile, you just put
together. All you need to keep are segment and ratio.

>>I would then need to apply the ratio attached to each segment to one of
the original datasets (unaggregated data). Imagine that the original data
has 5 cases per each of the 144 segments, and it has a numeric variable with
a value for each case. I would then want to apply the ratio for each
segment to each of the cases that falls within that segment to "adjust" its
value.

Next, match RatioFile with your original file 1 by using the table
subcommand. As in,

Match files file=file1/table=ratiofile/by segment.

Then you can do the computations that follow.

Gene Maguin

It sounds like you are suggesting using match files to line up the ratio for
each segment next to the values for each case. If I get this far, I can use
compute to get my final value. I'm sure the match cases syntax is simple,
but I'm not that familiar with it - could you please explain?

Thanks, nico

Richard Ristow

Re: automating a repetitive filter - sum task

At 05:54 PM 8/28/2007, Gene Maguin wrote:

>You said "There are 144 unique combinations based on 6 levels in v1, 6
>in v2, and 4 in v3." If v1 and v2 have values of 1 thru 6 and v3 has
>values of 1 thru 4, you could compute a segment variable in the
>following manner.:
>
>Compute segment=6*4*(v1-1)+4*(v2-1)+v3.
>
>If v1=4, v2=6, and v3=2 for one case, segment=24*3+4*5+2=94.
>
>If v1, v2 or v3 do not have inclusive values, then maybe you could
>recode
>them to create new variables that do.

That will definitely do it, though the resulting variable won't be easy
to read. I would consider,

a.) Multiplying by 100 and 10, instead of 24 and 4, to put the v-values
in separate digits, and more readable. That would also catch at least
some cases where v-values are out of range.

b.) Using a string for the segment variable, to use separators and
punctuation for readability.

c.) Not creating a segment variable at all. SPLIT FILE, AGGREGATE, and
MATCH FILES are perfectly happy with multiple variable keys, like the
triplet v1, v2, v3.

Anyway, to illustrate a.) and b.), here's SPSS 15 draft output (WRR:
not saved separately):

|-----------------------------|---------------------------|
|Output Created |28-AUG-2007 19:15:01 |
|-----------------------------|---------------------------|
CaseNum v1 v2 v3

01 2 3 1
02 2 3 4
03 4 3 1
04 4 3 4
05 6 3 1
06 6 3 4

Number of cases read: 6 Number of cases listed: 6

NUMERIC segment (F04)
/segment2 (F05).

STRING segmentC (A07)
/segmentD (A17).

Compute segment = 6*4*(v1-1)+ 4*(v2-1)+ v3.
Compute segment2 = 1E2*v1 + 1E1*v2 + v3.

Compute segmentC = CONCAT(LTRIM(STRING(v1,F2)) ,'.').
Compute segmentC = CONCAT(RTRIM(segmentC),
LTRIM(STRING(v2,F2)) ,'.').
Compute segmentC = CONCAT(RTRIM(segmentC),
LTRIM(STRING (v3,F2))).

Compute segmentD = CONCAT('v1=',LTRIM(STRING(v1,F2)) ,';').
Compute segmentD = CONCAT(RTRIM(segmentD),
'v2=',LTRIM(STRING(v2,F2)) ,';').
Compute segmentD = CONCAT(RTRIM(segmentD),
'v3=',LTRIM(STRING (v3,F2))).

LIST.

List
|-----------------------------|---------------------------|
|Output Created |28-AUG-2007 19:15:01 |
|-----------------------------|---------------------------|
CaseNum v1 v2 v3 segment segment2 segmentC segmentD

01 2 3 1 33 231 2.3.1 v1=2;v2=3;v3=1
02 2 3 4 36 234 2.3.4 v1=2;v2=3;v3=4
03 4 3 1 81 431 4.3.1 v1=4;v2=3;v3=1
04 4 3 4 84 434 4.3.4 v1=4;v2=3;v3=4
05 6 3 1 129 631 6.3.1 v1=6;v2=3;v3=1
06 6 3 4 132 634 6.3.4 v1=6;v2=3;v3=4

Number of cases read: 6 Number of cases listed: 6

===================
APPENDIX: Test data
===================
INPUT PROGRAM.
. NUMERIC CaseNum (N2).
. LEAVE CaseNum.
. NUMERIC v1 v2 v3 (F2).
. LEAVE v1 v2 v3.
. COMPUTE CaseNum = 0.
. LOOP v1 = 2 TO 6 BY 2.
. LOOP v2 = 3 TO 5 BY 4.
. LOOP v3 = 1 TO 4 BY 3.
. COMPUTE CaseNum = CaseNum + 1.
. END CASE.
. END LOOP.
. END LOOP.
. END LOOP.
END FILE.
END INPUT PROGRAM.

LIST.