Combining the output from multiple crosstabs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Combining the output from multiple crosstabs

Megan Misovic
Hello,

I am doing survey analysis and for each of the variables in my survey I am
interested in both the total count, total weighted percentage, as well as
the count and weight broken down by gender and geographical region.

My survey has hundreds of variables.  I feel there must be a better way of
disaggregating each of these by gender and region than by manually running
crosstabs and copying the counts and percentages into an excel spreadsheet.

What I would ideally like is a table structured like this:

Variable Name | Overall Count | Overall % | Men count | Men % | Women count
| Women % | Geographic region #1 count | Geographic region #2 % | etc.

Does anyone have any advice?  I don't have the budget to buy SPSS 'tables'
and I have been unable to find any help on-line.

Thanks and kind regards,

Megan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combining the output from multiple crosstabs

Maguin, Eugene
Megan,

Before I offer a suggestion I'd like to make sure I understand your desired
output because either I don't or you have an error in your description.

Let's say you had a question like 'Steven Harper should be reelected?
(Yes/No). Your crosstab would be a 2 by 2 plus marginal totals. Exactly what
numbers from the crosstab do you want to be in your table? You have three
choices: 'No' counts and percents for males, females and total sample; 'Yes'
counts and percents for the same, and total column counts and percents for a
gender by item crosstab. Also, what happens for DK/NA/Refused responses?

So I'm wondering if you aren't looking for a final result that looks like
this.

Variable Name | Value/Value label | Overall Count | Overall % |
    Men count | Men % | Women count | Women % |
    Geographic region #1 count | Geographic region #2 % | etc.


Thanks, Gene Maguin



>>I am doing survey analysis and for each of the variables in my survey I am
interested in both the total count, total weighted percentage, as well as
the count and weight broken down by gender and geographical region.

My survey has hundreds of variables.  I feel there must be a better way of
disaggregating each of these by gender and region than by manually running
crosstabs and copying the counts and percentages into an excel spreadsheet.

What I would ideally like is a table structured like this:

Variable Name | Overall Count | Overall % | Men count | Men % | Women count
| Women % | Geographic region #1 count | Geographic region #2 % | etc.

Does anyone have any advice?  I don't have the budget to buy SPSS 'tables'
and I have been unable to find any help on-line.

Thanks and kind regards,

Megan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combining the output from multiple crosstabs

Maguin, Eugene
Megan,

Ok, that is very helpful. Thanks.

So far as I know, there are two methods to get to what you want. Both
involve complexity. This is going to be lengthy so bear with me. I made up a
little dataset to use and it embodies the following expectations: 1) you
have missing data; 2) all DVs are yes/no response format.

data list / id sex region dv1 dv2 dv3(f2.0,5(1x,f1.0)).
begin data
01 1 1 9 2 1
02 9 1 2 1 1
03 1 2 2 9 2
04 2 9 2 1 2
05 1 3 1 2 1
06 2 3 1 1 9
07 1 3 2 2 2
08 2 2 1 1 1
09 9 1 2 9 2
10 2 9 2 1 2
11 1 2 9 2 1
12 9 3 1 1 1
13 1 9 2 9 2
14 2 1 2 1 1
15 1 1 1 2 9
end data.
execute.
*  dv1 to dv2: 1=yes; 2=no. sex: 1=male; 2=female.
missing values sex region dv1 dv2 dv3(9).

*  METHOD 1.
procedure output outfile='c:\mixedtests\meganf1.txt'.
crosstabs variables=sex dv1 dv2 dv3(1,2)/tables=sex by dv1 dv2 dv3/
   cells=count col/write all.

procedure output outfile='c:\mixedtests\meganf2.txt'.
crosstabs variables=region(1,3) dv1 dv2 dv3(1,2)/
   tables=region by dv1 dv2 dv3/cells=count col/write all.

*  METHOD 2.
oms /select tables/if subtypes='Crosstabulation'/
   destination format=sav outfile='c:\mixedtests\meganf1.sav'.

crosstabs sex by dv1 dv2 dv3/cells=count col.

omsend.

oms /select tables/if subtypes='Crosstabulation'/
   destination format=sav outfile='c:\mixedtests\meganf2.sav'.

crosstabs region by dv1 dv2 dv3/cells=count col.

omsend.

oms /select tables/if subtypes='Crosstabulation'/
   destination format=sav outfile='c:\mixedtests\meganf2.sav'.

crosstabs region by dv1 dv2 dv3/cells=count row.

omsend.


Commentary. Overall, my preference is for method 1 because I think it is
easier to work with. However, look at the output for method 2. It will seem
confusing especially if you have never used OMS before. The key
(unfortunate) thing is that each table adds columns to the file. So, 200
tables, maybe 400 or more columns. I think it's not a big deal; just an
additional step to collapse columms. Method 1, in contrast, adds rows. I'm
going to procede with method 1.


Method 1 commentary.
This method uses the write subcommand of the crosstabs command (read up on
the crosstabs documentation in the syntax reference). Note also that I used
the Integer mode rather than the more commonly used general model. The
output file for this is a text file (see syntax ref for further
documentation) and looks like this for the region variable. There are as
many rows as there are DVs times the range of values for region times 2 (for
yes and no). Tableid corresponds to the list order of the 'column' variables
on the generating crosstabs command. That the variables are not named here
is unfortunate but trivial. The values in the rowval and colval are the
values of the row and column variables, respectively.

 Tag TableID count  rowval   colval
   1   1       1       1       1
   1   1       1       2       1
   1   1       3       3       1
   1   1       3       1       2
   1   1       1       2       2
   1   1       1       3       2
   1   2       2       1       1
   1   2       1       2       1
   1   2       2       3       1
   1   2       2       1       2
   1   2       1       2       2
   1   2       2       3       2
   1   3       3       1       1
   1   3       2       2       1
   1   3       2       3       1
   1   3       1       1       2
   1   3       1       2       2
   1   3       1       3       2


The rest of the syntax is this.

data list file='c:\mixedtests\meganf1.txt'/
   tag 4 tableid 8 count 9-16 rowid 24 colid 32.
execute.

casestovars /id=tableid/drop=tag.

compute gendertotaln=sum(count.1 to count.4).
compute gendertotalyes=sum(count.1,count.2).
compute gendertotalpcnt=100*gendertotalyes/gendertotaln.
compute malepcnt=100*count.1/gendertotalyes.
compute femalepcnt=100*count.2/gendertotalyes.
execute.
format gendertotaln gendertotalyes(f2.0) gendertotalpcnt malepcnt
femalepcnt(f5.2).
list count.1 to count.4 rowid.1 to rowid.4 colid.1 to colid.4
   gendertotaln gendertotalyes gendertotalpcnt malepcnt femalepcnt.

save outfile='c:\mixedtests\meganf1a.sav'/keep=tableid count.1 count.2
   gendertotalyes malepcnt femalepcnt gendertotalpcnt/
   rename=(count.1 count.2=malen femalen).


****************************************************************************
******.
data list file='c:\mixedtests\meganf2.txt'/
   tag 4 tableid 8 count 9-16 rowid 24 colid 32.
execute.

casestovars /id=tableid/drop=tag.

compute regiontotaln=sum(count.1 to count.6).
compute regiontotalyes=sum(count.1,count.2,count.3).
compute regiontotalpcnt=100*regiontotalyes/regiontotaln.
compute region1pcnt=100*count.1/regiontotalyes.
compute region2pcnt=100*count.2/regiontotalyes.
compute region3pcnt=100*count.3/regiontotalyes.
execute.
format regiontotaln regiontotalyes(f2.0) regiontotalpcnt region1pcnt
region2pcnt region3pcnt(f5.2).
list count.1 to count.6 rowid.1 to rowid.6 colid.1 to colid.6
   regiontotaln regiontotalyes regiontotalpcnt region1pcnt region2pcnt
region3pcnt.

save outfile='c:\mixedtests\meganf2a.sav'/keep=tableid count.1 count.2
count.3
   regiontotalyes region1pcnt region2pcnt region3pcnt regiontotalpcnt/
   rename=(count.1 count.2 count.3=region1n region2n region3n).


****************************************************************************
******.
match files file='c:\mixedtests\meganf1a.sav'/
   file='c:\mixedtests\meganf2a.sav'/by tableid.
execute.

value labels tableid 1 'dv1' 2 'dv2' 3 'dv3'.
summarize tables=tableid gendertotalyes gendertotalpcnt malen malepcnt
femalen
   femalepcnt
   regiontotalyes regiontotalpcnt region1n region1pcnt region2n region2pcnt
   region3n region3pcnt/format=list.


Commentary.
The basic operation is to read each file written out from the crossstabs,
perform the required computations to compute totals and percents, and save a
new file keeping the relevant variables. The two  new files are put together
by tableid. I used Summarize to output the results thinking it would be more
accessible for you. Of course, the Summarize command output can be exported
and worked over in word.

I think I have every thing correct but check the results carefully and let
me know if you have problems.


Gene Maguin


-----Original Message-----
From: Megan Misovic [mailto:[hidden email]]
Sent: Friday, August 27, 2010 6:18 PM
To: 'Gene Maguin'
Subject: RE: Combining the output from multiple crosstabs

Hi Gene,

You are absolutely right - I am interested only in the 'yes' responses.

In my case I am analysing a health survey, so in most cases I am only
interested if the person has a certain health condition.

So, for example I would like to be able to report:

Condition | Provincial Total Count | Provincial Total Percentage | Men count

Diabetes | 56 | 3% | 34|
High Blood Pressure | 246 | 5% | 176
Etc.

So- the 'provincial total count' would be the total number of 'yes' cases in
the data (this is important because we cannot report small numbers), the
'provincial total percentage' would be the percent of all respondents who
had the condition, the 'men count' would be the number of men with the
condition and so on.

Does this help describe a little better what I'm looking for?

I started doing all this by hand and I realized that I would be foolish to
try to do this with brute force - there must be a way to automate it!

Thanks for your help Gene!

Megan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: August 27, 2010 3:05 PM
To: [hidden email]
Subject: Re: Combining the output from multiple crosstabs

Megan,

Before I offer a suggestion I'd like to make sure I understand your desired
output because either I don't or you have an error in your description.

Let's say you had a question like 'Steven Harper should be reelected?
(Yes/No). Your crosstab would be a 2 by 2 plus marginal totals. Exactly what
numbers from the crosstab do you want to be in your table? You have three
choices: 'No' counts and percents for males, females and total sample; 'Yes'
counts and percents for the same, and total column counts and percents for a
gender by item crosstab. Also, what happens for DK/NA/Refused responses?

So I'm wondering if you aren't looking for a final result that looks like
this.

Variable Name | Value/Value label | Overall Count | Overall % |
    Men count | Men % | Women count | Women % |
    Geographic region #1 count | Geographic region #2 % | etc.


Thanks, Gene Maguin



>>I am doing survey analysis and for each of the variables in my survey I am
interested in both the total count, total weighted percentage, as well as
the count and weight broken down by gender and geographical region.

My survey has hundreds of variables.  I feel there must be a better way of
disaggregating each of these by gender and region than by manually running
crosstabs and copying the counts and percentages into an excel spreadsheet.

What I would ideally like is a table structured like this:

Variable Name | Overall Count | Overall % | Men count | Men % | Women count
| Women % | Geographic region #1 count | Geographic region #2 % | etc.

Does anyone have any advice?  I don't have the budget to buy SPSS 'tables'
and I have been unable to find any help on-line.

Thanks and kind regards,

Megan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Combining the output from multiple crosstabs

Bruce Weaver
Administrator
Gene Maguin wrote
Megan,

Ok, that is very helpful. Thanks.

So far as I know, there are two methods to get to what you want. Both
involve complexity. This is going to be lengthy so bear with me. I made up a
little dataset to use and it embodies the following expectations: 1) you
have missing data; 2) all DVs are yes/no response format.

--- snip ---
Here's a different approach that gives what Megan asks for, I think.  I'll use Gene's data to illustrate.

data list / id sex region dv1 dv2 dv3(f2.0,5(1x,f1.0)).
begin data
01 1 1 9 2 1
02 9 1 2 1 1
03 1 2 2 9 2
04 2 9 2 1 2
05 1 3 1 2 1
06 2 3 1 1 9
07 1 3 2 2 2
08 2 2 1 1 1
09 9 1 2 9 2
10 2 9 2 1 2
11 1 2 9 2 1
12 9 3 1 1 1
13 1 9 2 9 2
14 2 1 2 1 1
15 1 1 1 2 9
end data.

* I prefer 1=Yes, 0=No coding, because then SUMS give the count of Yes responses.
recode sex dv1 to dv3 (2=0) (else=copy).
exe.

*  dv1 to dv2: 1=yes; 0=no. sex: 1=male; 0=female.
missing values sex region dv1 dv2 dv3(9).

* Megan said, "provincial total count' would be the total number of 'yes' cases in the data".
* This suggests that the data are from one province only, so no need to handle multiple provinces.

* Use AGGREGATE to get Provincial Total Count and Provincial Total Percentage .
AGGREGATE
  /OUTFILE= "C:\temp\province.sav"
  /BREAK=
  /PTC1 = SUM(dv1)
  /PTC2 = SUM(dv2)
  /PTC3 = SUM(dv3)
  /PTpercent1 = PGT(dv1 0)
  /PTpercent2 = PGT(dv2 0)
  /PTpercent3 = PGT(dv3 0)
.

* Now get the total count for men only .

temporary.
select if (sex EQ 1).
AGGREGATE
  /OUTFILE= "C:\temp\men.sav"  /BREAK=
  /MenTC1 = SUM(dv1)
  /MenTC2 = SUM(dv2)
  /MenTC3 = SUM(dv3)

get file = "C:\temp\province.sav" .
match files
 file = * /
 file = "C:\temp\men.sav" .
exe.


VARSTOCASES
  /MAKE ProvTotCount FROM PTC1 PTC2 PTC3
  /MAKE ProvTotPercent FROM PTpercent1 PTpercent2 PTpercent3
  /MAKE MenTotCount FROM MenTC1 MenTC2 MenTC3
  /INDEX=Condition(3)
  /KEEP=
  /NULL=KEEP.

formats ProvTotCount MenTotCount (f5.0) / ProvTotPercent (f5.2).
 
variable labels
 condition "Condition"
 ProvTotCount "Provincial Total Count"
 ProvTotPercent "Provincial Total Percent"
 MenTotCount "Total Count for Men"
.
value labels Condition
 1 "Condition 1 label"
 2 "Condition 2 label"
 3 "Condition 3 label"
.

Report
  /FORMAT=CHWRAP(ON) PREVIEW(OFF) CHALIGN(BOTTOM) UNDERSCORE(ON) ONEBREAKCOLUMN(OFF) CHDSPACE(1)
    SUMSPACE(0) AUTOMATIC LIST BRKSPACE(0) PAGE(1) MISSING'.' LENGTH(1, 39) ALIGN(LEFT) TSPACE(1)
    FTSPACE(1) MARGINS(1,55)
  /TITLE=LEFT 'Title Here' RIGHT 'Page )PAGE'
  /VARIABLES Condition (LABELS) (LEFT) (OFFSET(0))(20)
    ProvTotCount (VALUES) (RIGHT) (OFFSET(0)) (10)
    ProvTotPercent (VALUES) (RIGHT) (OFFSET(0)) (10)
    MenTotCount (VALUES) (RIGHT) (OFFSET(0)) (8) .


HTH.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).