Combination of AGGREGATE and CASESTOVARS?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Combination of AGGREGATE and CASESTOVARS?

Mike Donatello
Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements.  Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID   Date      Story_Type        Medium        Focus             Tone
109065      1-Oct-10  Standard story    Network_TV    Primary focus     7
109066      1-Oct-10  Standard story    Network_TV    Primary focus     6
109067      1-Oct-10  Standard story    Newspaper     Primary focus     6
109070      1-Oct-10  Standard story    Newspaper     Primary focus     4
109071      1-Oct-10  Interview         Newspaper     Secondary focus   5
109072      1-Oct-10  Interview         Wire service  Primary focus     5
109073      2-Oct-10  Interview         Wire service  Primary focus     3
109074      2-Oct-10  Standard story    Blog          Secondary focus   1
109076      2-Oct-10  Standard story    Newspaper     Secondary focus   4
109077      3-Oct-10  Brief             Wire service  Primary focus     4


What I’d like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable.  So, my ideal format would look something like:

Date       Total_Stories   Std_stories   Intvws   Briefs   Network_TV
Newspaper   ETC.
1-Oct-10   6               4             2        0        2            3
2-Oct-10   3               2             1        0        0            1
3-Oct-10   1               0             0        1        0            0


Basically, everything gets collapsed into frequencies.  The only exception
to this is for the scalar Tone variable.  Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable.  Everything else is messed up, because aggregation should be
conditional based on the value of the variables.  (That is, all interviews
are aggregated, but are not mixed with briefs.)  I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant.  So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1.  Ditto for the other
variables.

Can anyone point me in the right direction here?  I’m trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

ViAnn Beadle
Why not use crosstabs?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Donatello
Sent: Monday, January 17, 2011 11:28 AM
To: [hidden email]
Subject: Combination of AGGREGATE and CASESTOVARS?

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I could get some help.

I have a data file in which each record comprises a specific news story, with publication date and codes for various content elements.  Each of the content elements may contain one of several values, but the values are all single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID   Date      Story_Type        Medium        Focus             Tone
109065      1-Oct-10  Standard story    Network_TV    Primary focus     7
109066      1-Oct-10  Standard story    Network_TV    Primary focus     6
109067      1-Oct-10  Standard story    Newspaper     Primary focus     6
109070      1-Oct-10  Standard story    Newspaper     Primary focus     4
109071      1-Oct-10  Interview         Newspaper     Secondary focus   5
109072      1-Oct-10  Interview         Wire service  Primary focus     5
109073      2-Oct-10  Interview         Wire service  Primary focus     3
109074      2-Oct-10  Standard story    Blog          Secondary focus   1
109076      2-Oct-10  Standard story    Newspaper     Secondary focus   4
109077      3-Oct-10  Brief             Wire service  Primary focus     4


What I’d like to do is to reorder these data so that each record represents a particular date, followed by counts of specific codes for each content variable.  So, my ideal format would look something like:

Date       Total_Stories   Std_stories   Intvws   Briefs   Network_TV
Newspaper   ETC.
1-Oct-10   6               4             2        0        2            3
2-Oct-10   3               2             1        0        0            1
3-Oct-10   1               0             0        1        0            0


Basically, everything gets collapsed into frequencies.  The only exception to this is for the scalar Tone variable.  Rather than a count, I need the mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by date (because each record is one story), as well as the mean of the scale variable.  Everything else is messed up, because aggregation should be conditional based on the value of the variables.  (That is, all interviews are aggregated, but are not mixed with briefs.)  I tried CASESTOVARS, but then I end up with inconsistent numbers of variables per record, because the number of original records being collapsed is not constant.  So, for example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while the record for 3 OCT would have only Story_Type.1.  Ditto for the other variables.

Can anyone point me in the right direction here?  I’m trying to figure out how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

Mike Donatello
In reply to this post by Mike Donatello
I need something that runs without intervention.  This is an ongoing
project, with several years' worth of data.  Just not practical to do it by
hand.

On Mon, 17 Jan 2011 11:57:36 -0700, ViAnn Beadle <[hidden email]> wrote:

>Why not use crosstabs?
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Mike Donatello
>Sent: Monday, January 17, 2011 11:28 AM
>To: [hidden email]
>Subject: Combination of AGGREGATE and CASESTOVARS?
>
>Hi folks,
>
>Having some trouble remembering back to my syntax days and was hoping I
could get some help.
>
>I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements.  Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).

>The number of stories (and records) for a particular date is variable.
>
>Record_ID   Date      Story_Type        Medium        Focus             Tone
>109065      1-Oct-10  Standard story    Network_TV    Primary focus     7
>109066      1-Oct-10  Standard story    Network_TV    Primary focus     6
>109067      1-Oct-10  Standard story    Newspaper     Primary focus     6
>109070      1-Oct-10  Standard story    Newspaper     Primary focus     4
>109071      1-Oct-10  Interview         Newspaper     Secondary focus   5
>109072      1-Oct-10  Interview         Wire service  Primary focus     5
>109073      2-Oct-10  Interview         Wire service  Primary focus     3
>109074      2-Oct-10  Standard story    Blog          Secondary focus   1
>109076      2-Oct-10  Standard story    Newspaper     Secondary focus   4
>109077      3-Oct-10  Brief             Wire service  Primary focus     4
>
>
>What I’d like to do is to reorder these data so that each record
represents a particular date, followed by counts of specific codes for each
content variable.  So, my ideal format would look something like:
>
>Date       Total_Stories   Std_stories   Intvws   Briefs   Network_TV
>Newspaper   ETC.
>1-Oct-10   6               4             2        0        2            3
>2-Oct-10   3               2             1        0        0            1
>3-Oct-10   1               0             0        1        0            0
>
>
>Basically, everything gets collapsed into frequencies.  The only exception
to this is for the scalar Tone variable.  Rather than a count, I need the
mean for that variable across all records with the same date.
>
>I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable.  Everything else is messed up, because aggregation should be
conditional based on the value of the variables.  (That is, all interviews
are aggregated, but are not mixed with briefs.)  I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant.  So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1.  Ditto for the other
variables.
>
>Can anyone point me in the right direction here?  I’m trying to figure
out how to do this all with loops, but it seems very messy.
>
>Thanks,
>
>Mike Donatello
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

Maguin, Eugene
In reply to this post by Mike Donatello
Mike,

I think you can do this pretty easily with aggregate. The one problem is that you seem to have text codes for a number of your variables. But that can be easily fixed via a recode.

The key subcommands are
PGT(varlist,value) Percentage of cases greater than the specified value.
PLT(varlist,value) Percentage of cases less than the specified value.
PIN(varlist,value1,value2) Percentage of cases between value1 and value2, inclusive.

For instance your variable story_type has three values: Standard story, Interview, Brief. So let Standard story=1, Interview=2, Brief=3.
Then aggregate by date creating the new variables Standard, Interview, Brief

Aggregate outfile=*/break=date/standard=plt(story_type,1.5)/
   interview=pin(story_type,1.5,2.5)/
   brief=pgt(story_type,2.5).

Read over the documentation for the Aggregate command and you can get the rest of the summary variables that you want.

Note that if you do not like percentages, there are a corresonding set of functions that give fractions computed in the same manner.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Donatello
Sent: Monday, January 17, 2011 1:28 PM
To: [hidden email]
Subject: Combination of AGGREGATE and CASESTOVARS?

Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements.  Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID   Date      Story_Type        Medium        Focus             Tone
109065      1-Oct-10  Standard story    Network_TV    Primary focus     7
109066      1-Oct-10  Standard story    Network_TV    Primary focus     6
109067      1-Oct-10  Standard story    Newspaper     Primary focus     6
109070      1-Oct-10  Standard story    Newspaper     Primary focus     4
109071      1-Oct-10  Interview         Newspaper     Secondary focus   5
109072      1-Oct-10  Interview         Wire service  Primary focus     5
109073      2-Oct-10  Interview         Wire service  Primary focus     3
109074      2-Oct-10  Standard story    Blog          Secondary focus   1
109076      2-Oct-10  Standard story    Newspaper     Secondary focus   4
109077      3-Oct-10  Brief             Wire service  Primary focus     4


What I’d like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable.  So, my ideal format would look something like:

Date       Total_Stories   Std_stories   Intvws   Briefs   Network_TV
Newspaper   ETC.
1-Oct-10   6               4             2        0        2            3
2-Oct-10   3               2             1        0        0            1
3-Oct-10   1               0             0        1        0            0


Basically, everything gets collapsed into frequencies.  The only exception
to this is for the scalar Tone variable.  Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable.  Everything else is messed up, because aggregation should be
conditional based on the value of the variables.  (That is, all interviews
are aggregated, but are not mixed with briefs.)  I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant.  So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1.  Ditto for the other
variables.

Can anyone point me in the right direction here?  I’m trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

David Marso
Administrator
In reply to this post by Mike Donatello
Hi Mike,
Simply create binary 0/1 variables for the various attributes then use SUM within the aggregate.
Use Mean for Tone, N for Total.

Something like:
COMPUTE ST_Standard=(Story_Type EQ "Standard story").
COMPUTE ST_Interview=(Story_Type EQ "Interview").
COMPUTE ST_Brief=(Story_Type EQ "Brief").
...
COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
etc....
AGGREGATE OUTFILE *
                / BREAK =Date
                / Total = N
                / ST_Standard     ST_Interview ... MD_Network_TV .... etc
        =SUM(ST_Standard     ST_Interview ... MD_Network_TV .... etc...)
               / Tone = MEAN(Tone).
etc...
HTH, David




Mike Donatello-2 wrote
Hi folks,

Having some trouble remembering back to my syntax days and was hoping I
could get some help.

I have a data file in which each record comprises a specific news story,
with publication date and codes for various content elements.  Each of the
content elements may contain one of several values, but the values are all
single-select (i.e., no multiple values on a particular content variable).
The number of stories (and records) for a particular date is variable.

Record_ID   Date      Story_Type        Medium        Focus             Tone
109065      1-Oct-10  Standard story    Network_TV    Primary focus     7
109066      1-Oct-10  Standard story    Network_TV    Primary focus     6
109067      1-Oct-10  Standard story    Newspaper     Primary focus     6
109070      1-Oct-10  Standard story    Newspaper     Primary focus     4
109071      1-Oct-10  Interview         Newspaper     Secondary focus   5
109072      1-Oct-10  Interview         Wire service  Primary focus     5
109073      2-Oct-10  Interview         Wire service  Primary focus     3
109074      2-Oct-10  Standard story    Blog          Secondary focus   1
109076      2-Oct-10  Standard story    Newspaper     Secondary focus   4
109077      3-Oct-10  Brief             Wire service  Primary focus     4


What I’d like to do is to reorder these data so that each record represents
a particular date, followed by counts of specific codes for each content
variable.  So, my ideal format would look something like:

Date       Total_Stories   Std_stories   Intvws   Briefs   Network_TV
Newspaper   ETC.
1-Oct-10   6               4             2        0        2            3
2-Oct-10   3               2             1        0        0            1
3-Oct-10   1               0             0        1        0            0


Basically, everything gets collapsed into frequencies.  The only exception
to this is for the scalar Tone variable.  Rather than a count, I need the
mean for that variable across all records with the same date.

I tried the AGGREGATE command and could get the count of total stories by
date (because each record is one story), as well as the mean of the scale
variable.  Everything else is messed up, because aggregation should be
conditional based on the value of the variables.  (That is, all interviews
are aggregated, but are not mixed with briefs.)  I tried CASESTOVARS, but
then I end up with inconsistent numbers of variables per record, because the
number of original records being collapsed is not constant.  So, for
example, the record for 1 OCT would have Story_Type.1 to Story_Type.6, while
the record for 3 OCT would have only Story_Type.1.  Ditto for the other
variables.

Can anyone point me in the right direction here?  I’m trying to figure out
how to do this all with loops, but it seems very messy.

Thanks,

Mike Donatello

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

Mike Donatello
In reply to this post by Mike Donatello
Thanks David and Gene.  I should've known it would be something straightforward.

One remaining question: If I'm using the /OUTFILE=filename option, is there
a way to carry over any variables from the original file?  If so, I don't
see it in the documentation.  It seems like ADDVARIABLES is the only way to
maintain any of the originals, but then I'm back to having too many cases,
since the aggregated results are appended to each of the original records,
rather than having the aggregated cases replace the original records.

Any ideas?

Thanks very much,

MD


On Mon, 17 Jan 2011 12:19:29 -0800, David Marso <[hidden email]> wrote:

>Hi Mike,
>Simply create binary 0/1 variables for the various attributes then use SUM
>within the aggregate.
>Use Mean for Tone, N for Total.
>
>Something like:
>COMPUTE ST_Standard=(Story_Type EQ "Standard story").
>COMPUTE ST_Interview=(Story_Type EQ "Interview").
>COMPUTE ST_Brief=(Story_Type EQ "Brief").
>...
>COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
>etc....
>AGGREGATE OUTFILE *
>                / BREAK =Date
>                / Total = N
>                / ST_Standard     ST_Interview ... MD_Network_TV .... etc
>        =SUM(ST_Standard     ST_Interview ... MD_Network_TV .... etc...)
>               / Tone = MEAN(Tone).
>etc...
>HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combination of AGGREGATE and CASESTOVARS?

David Marso
Administrator
Hi Mike,
I assume these other variables are CONSTANT across a particular date.
In this case simply add them to the list of break variables.
AGGREGATE OUTFILE *
   / BREAK date const1... constK
   / Total = N / blah blah blah as before....
If they are not constant across a date then the question must be reformulated ;-)
HTH, David
----
 
Mike Donatello-2 wrote
Thanks David and Gene.  I should've known it would be something straightforward.

One remaining question: If I'm using the /OUTFILE=filename option, is there
a way to carry over any variables from the original file?  If so, I don't
see it in the documentation.  It seems like ADDVARIABLES is the only way to
maintain any of the originals, but then I'm back to having too many cases,
since the aggregated results are appended to each of the original records,
rather than having the aggregated cases replace the original records.

Any ideas?

Thanks very much,

MD


On Mon, 17 Jan 2011 12:19:29 -0800, David Marso <david.marso@gmail.com> wrote:

>Hi Mike,
>Simply create binary 0/1 variables for the various attributes then use SUM
>within the aggregate.
>Use Mean for Tone, N for Total.
>
>Something like:
>COMPUTE ST_Standard=(Story_Type EQ "Standard story").
>COMPUTE ST_Interview=(Story_Type EQ "Interview").
>COMPUTE ST_Brief=(Story_Type EQ "Brief").
>...
>COMPUTE MD_Network_TV=(Medium EQ "Network_TV").
>etc....
>AGGREGATE OUTFILE *
>                / BREAK =Date
>                / Total = N
>                / ST_Standard     ST_Interview ... MD_Network_TV .... etc
>        =SUM(ST_Standard     ST_Interview ... MD_Network_TV .... etc...)
>               / Tone = MEAN(Tone).
>etc...
>HTH, David

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"