SPSSX Discussion

Combination of AGGREGATE and CASESTOVARS?

Classic

List

Threaded

7 messages Options

Mike Donatello

Combination of AGGREGATE and CASESTOVARS?

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements. Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID Date Story_Type Medium Focus Tone
109065 1-Oct-10 Standard story Network_TV Primary focus 7
109066 1-Oct-10 Standard story Network_TV Primary focus 6
109067 1-Oct-10 Standard story Newspaper Primary focus 6
109070 1-Oct-10 Standard story Newspaper Primary focus 4
109071 1-Oct-10 Interview Newspaper Secondary focus 5
109072 1-Oct-10 Interview Wire service Primary focus 5
109073 2-Oct-10 Interview Wire service Primary focus 3
109074 2-Oct-10 Standard story Blog Secondary focus 1
109076 2-Oct-10 Standard story Newspaper Secondary focus 4
109077 3-Oct-10 Brief Wire service Primary focus 4

What Iâd like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable. So, my ideal format would look something like:

Date Total_Stories Std_stories Intvws Briefs Network_TV
Newspaper ETC.
1-Oct-10 6 4 2 0 2 3
2-Oct-10 3 2 1 0 0 1
3-Oct-10 1 0 0 1 0 0

Basically, everything gets collapsed into frequencies. The only exception
to this is for the scalar Tone variable. Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable. Everything else is messed up, because aggregation should be
conditional based on the value of the variables. (That is, all interviews
are aggregated, but are not mixed with briefs.) I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant. So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1. Ditto for the other
variables.

Can anyone point me in the right direction here? Iâm trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ViAnn Beadle

Re: Combination of AGGREGATE and CASESTOVARS?

Why not use crosstabs?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Donatello
Sent: Monday, January 17, 2011 11:28 AM
To: [hidden email]
Subject: Combination of AGGREGATE and CASESTOVARS?

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I could get some help.

I have a data file in which each record comprises a specific news story, with publication date and codes for various content elements. Each of the content elements may contain one of several values, but the values are all single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID Date Story_Type Medium Focus Tone
109065 1-Oct-10 Standard story Network_TV Primary focus 7
109066 1-Oct-10 Standard story Network_TV Primary focus 6
109067 1-Oct-10 Standard story Newspaper Primary focus 6
109070 1-Oct-10 Standard story Newspaper Primary focus 4
109071 1-Oct-10 Interview Newspaper Secondary focus 5
109072 1-Oct-10 Interview Wire service Primary focus 5
109073 2-Oct-10 Interview Wire service Primary focus 3
109074 2-Oct-10 Standard story Blog Secondary focus 1
109076 2-Oct-10 Standard story Newspaper Secondary focus 4
109077 3-Oct-10 Brief Wire service Primary focus 4

What Iâ€™d like to do is to reorder these data so that each record represents a particular date, followed by counts of specific codes for each content variable. So, my ideal format would look something like:

Date Total_Stories Std_stories Intvws Briefs Network_TV
Newspaper ETC.
1-Oct-10 6 4 2 0 2 3
2-Oct-10 3 2 1 0 0 1
3-Oct-10 1 0 0 1 0 0

Basically, everything gets collapsed into frequencies. The only exception to this is for the scalar Tone variable. Rather than a count, I need the mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by date (because each record is one story), as well as the mean of the scale variable. Everything else is messed up, because aggregation should be conditional based on the value of the variables. (That is, all interviews are aggregated, but are not mixed with briefs.) I tried CASESTOVARS, but then I end up with inconsistent numbers of variables per record, because the number of original records being collapsed is not constant. So, for example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while the record for 3 OCT would have only Story_Type.1. Ditto for the other variables.

Can anyone point me in the right direction here? Iâ€™m trying to figure out how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike Donatello

Re: Combination of AGGREGATE and CASESTOVARS?

In reply to this post by Mike Donatello

I need something that runs without intervention. This is an ongoing
project, with several years' worth of data. Just not practical to do it by
hand.

On Mon, 17 Jan 2011 11:57:36 -0700, ViAnn Beadle <[hidden email]> wrote:

>Why not use crosstabs?
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Mike Donatello
>Sent: Monday, January 17, 2011 11:28 AM
>To: [hidden email]
>Subject: Combination of AGGREGATE and CASESTOVARS?
>
>Hi folks,
>
>Having some trouble remembering back to my syntax days and was hoping I
could get some help.
>
>I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements. Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).

>The number of stories (and records) for a particular date is variable.
>
>Record_ID Date Story_Type Medium Focus Tone
>109065 1-Oct-10 Standard story Network_TV Primary focus 7
>109066 1-Oct-10 Standard story Network_TV Primary focus 6
>109067 1-Oct-10 Standard story Newspaper Primary focus 6
>109070 1-Oct-10 Standard story Newspaper Primary focus 4
>109071 1-Oct-10 Interview Newspaper Secondary focus 5
>109072 1-Oct-10 Interview Wire service Primary focus 5
>109073 2-Oct-10 Interview Wire service Primary focus 3
>109074 2-Oct-10 Standard story Blog Secondary focus 1
>109076 2-Oct-10 Standard story Newspaper Secondary focus 4
>109077 3-Oct-10 Brief Wire service Primary focus 4
>
>
>What IÃ¢â¬â¢d like to do is to reorder these data so that each record

represents a particular date, followed by counts of specific codes for each
content variable. So, my ideal format would look something like:
>
>Date Total_Stories Std_stories Intvws Briefs Network_TV
>Newspaper ETC.
>1-Oct-10 6 4 2 0 2 3
>2-Oct-10 3 2 1 0 0 1
>3-Oct-10 1 0 0 1 0 0
>
>
>Basically, everything gets collapsed into frequencies. The only exception
to this is for the scalar Tone variable. Rather than a count, I need the
mean for that variable across all records with the same date.
>
>I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable. Everything else is messed up, because aggregation should be
conditional based on the value of the variables. (That is, all interviews
are aggregated, but are not mixed with briefs.) I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant. So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1. Ditto for the other
variables.
>
>Can anyone point me in the right direction here? IÃ¢â¬â¢m trying to figure
out how to do this all with loops, but it seems very messy.
>
>Thanks,
>
>Mike Donatello
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Combination of AGGREGATE and CASESTOVARS?

In reply to this post by Mike Donatello

Mike,

I think you can do this pretty easily with aggregate. The one problem is that you seem to have text codes for a number of your variables. But that can be easily fixed via a recode.

The key subcommands are
PGT(varlist,value) Percentage of cases greater than the specified value.
PLT(varlist,value) Percentage of cases less than the specified value.
PIN(varlist,value1,value2) Percentage of cases between value1 and value2, inclusive.

For instance your variable story_type has three values: Standard story, Interview, Brief. So let Standard story=1, Interview=2, Brief=3.
Then aggregate by date creating the new variables Standard, Interview, Brief

Aggregate outfile=*/break=date/standard=plt(story_type,1.5)/
interview=pin(story_type,1.5,2.5)/
brief=pgt(story_type,2.5).

Read over the documentation for the Aggregate command and you can get the rest of the summary variables that you want.

Note that if you do not like percentages, there are a corresonding set of functions that give fractions computed in the same manner.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Donatello
Sent: Monday, January 17, 2011 1:28 PM
To: [hidden email]
Subject: Combination of AGGREGATE and CASESTOVARS?

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements. Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID Date Story_Type Medium Focus Tone
109065 1-Oct-10 Standard story Network_TV Primary focus 7
109066 1-Oct-10 Standard story Network_TV Primary focus 6
109067 1-Oct-10 Standard story Newspaper Primary focus 6
109070 1-Oct-10 Standard story Newspaper Primary focus 4
109071 1-Oct-10 Interview Newspaper Secondary focus 5
109072 1-Oct-10 Interview Wire service Primary focus 5
109073 2-Oct-10 Interview Wire service Primary focus 3
109074 2-Oct-10 Standard story Blog Secondary focus 1
109076 2-Oct-10 Standard story Newspaper Secondary focus 4
109077 3-Oct-10 Brief Wire service Primary focus 4

What Iâ€™d like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable. So, my ideal format would look something like:

Date Total_Stories Std_stories Intvws Briefs Network_TV
Newspaper ETC.
1-Oct-10 6 4 2 0 2 3
2-Oct-10 3 2 1 0 0 1
3-Oct-10 1 0 0 1 0 0

Basically, everything gets collapsed into frequencies. The only exception
to this is for the scalar Tone variable. Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable. Everything else is messed up, because aggregation should be
conditional based on the value of the variables. (That is, all interviews
are aggregated, but are not mixed with briefs.) I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant. So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1. Ditto for the other
variables.

Can anyone point me in the right direction here? Iâ€™m trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Combination of AGGREGATE and CASESTOVARS?

Administrator

In reply to this post by Mike Donatello

Hi Mike,
Simply create binary 0/1 variables for the various attributes then use SUM within the aggregate.
Use Mean for Tone, N for Total.

Something like:
COMPUTE ST_Standard=(Story_Type EQ "Standard story").
COMPUTE ST_Interview=(Story_Type EQ "Interview").
COMPUTE ST_Brief=(Story_Type EQ "Brief").
...
COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
etc....
AGGREGATE OUTFILE *
/ BREAK =Date
/ Total = N
/ ST_Standard ST_Interview ... MD_Network_TV .... etc
=SUM(ST_Standard ST_Interview ... MD_Network_TV .... etc...)
/ Tone = MEAN(Tone).
etc...
HTH, David

Mike Donatello-2 wrote

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements. Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID Date Story_Type Medium Focus Tone
109065 1-Oct-10 Standard story Network_TV Primary focus 7
109066 1-Oct-10 Standard story Network_TV Primary focus 6
109067 1-Oct-10 Standard story Newspaper Primary focus 6
109070 1-Oct-10 Standard story Newspaper Primary focus 4
109071 1-Oct-10 Interview Newspaper Secondary focus 5
109072 1-Oct-10 Interview Wire service Primary focus 5
109073 2-Oct-10 Interview Wire service Primary focus 3
109074 2-Oct-10 Standard story Blog Secondary focus 1
109076 2-Oct-10 Standard story Newspaper Secondary focus 4
109077 3-Oct-10 Brief Wire service Primary focus 4

What Iâd like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable. So, my ideal format would look something like:

Date Total_Stories Std_stories Intvws Briefs Network_TV
Newspaper ETC.
1-Oct-10 6 4 2 0 2 3
2-Oct-10 3 2 1 0 0 1
3-Oct-10 1 0 0 1 0 0

Basically, everything gets collapsed into frequencies. The only exception
to this is for the scalar Tone variable. Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable. Everything else is messed up, because aggregation should be
conditional based on the value of the variables. (That is, all interviews
are aggregated, but are not mixed with briefs.) I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant. So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1. Ditto for the other
variables.

Can anyone point me in the right direction here? Iâm trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Mike Donatello

Re: Combination of AGGREGATE and CASESTOVARS?

In reply to this post by Mike Donatello

Thanks David and Gene. I should've known it would be something straightforward.

One remaining question: If I'm using the /OUTFILE=filename option, is there
a way to carry over any variables from the original file? If so, I don't
see it in the documentation. It seems like ADDVARIABLES is the only way to
maintain any of the originals, but then I'm back to having too many cases,
since the aggregated results are appended to each of the original records,
rather than having the aggregated cases replace the original records.

Any ideas?

Thanks very much,

MD

On Mon, 17 Jan 2011 12:19:29 -0800, David Marso <[hidden email]> wrote:

>Hi Mike,
>Simply create binary 0/1 variables for the various attributes then use SUM
>within the aggregate.
>Use Mean for Tone, N for Total.
>
>Something like:
>COMPUTE ST_Standard=(Story_Type EQ "Standard story").
>COMPUTE ST_Interview=(Story_Type EQ "Interview").
>COMPUTE ST_Brief=(Story_Type EQ "Brief").
>...
>COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
>etc....
>AGGREGATE OUTFILE *
> / BREAK =Date
> / Total = N
> / ST_Standard ST_Interview ... MD_Network_TV .... etc
> =SUM(ST_Standard ST_Interview ... MD_Network_TV .... etc...)
> / Tone = MEAN(Tone).
>etc...
>HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Combination of AGGREGATE and CASESTOVARS?

Administrator

Hi Mike,
I assume these other variables are CONSTANT across a particular date.
In this case simply add them to the list of break variables.
AGGREGATE OUTFILE *
/ BREAK date const1... constK
/ Total = N / blah blah blah as before....
If they are not constant across a date then the question must be reformulated ;-)
HTH, David
----

Mike Donatello-2 wrote

Thanks David and Gene. I should've known it would be something straightforward.

One remaining question: If I'm using the /OUTFILE=filename option, is there
a way to carry over any variables from the original file? If so, I don't
see it in the documentation. It seems like ADDVARIABLES is the only way to
maintain any of the originals, but then I'm back to having too many cases,
since the aggregated results are appended to each of the original records,
rather than having the aggregated cases replace the original records.

Any ideas?

Thanks very much,

MD

On Mon, 17 Jan 2011 12:19:29 -0800, David Marso <david.marso@gmail.com> wrote:

>Hi Mike,
>Simply create binary 0/1 variables for the various attributes then use SUM
>within the aggregate.
>Use Mean for Tone, N for Total.
>
>Something like:
>COMPUTE ST_Standard=(Story_Type EQ "Standard story").
>COMPUTE ST_Interview=(Story_Type EQ "Interview").
>COMPUTE ST_Brief=(Story_Type EQ "Brief").
>...
>COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
>etc....
>AGGREGATE OUTFILE *
> / BREAK =Date
> / Total = N
> / ST_Standard ST_Interview ... MD_Network_TV .... etc
> =SUM(ST_Standard ST_Interview ... MD_Network_TV .... etc...)
> / Tone = MEAN(Tone).
>etc...
>HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD