SPSSX Discussion

How to best represent data in SPSS

Classic

List

Threaded

3 messages Options

Ben-142

How to best represent data in SPSS

Hi. I'd greatly appreciate some help with this issue. I'm organising for a
litter count where litter items are categorised into 83 types (e.g. small
plastic bottles (SPB), large plastic bottles (LPB), cigarette packets (CP),
etc). Furthermore, I need to record brands within each litter type and the
number of items of each brand (e.g. 11 small plastic bottles with the Coke
brand (SPB_COKE), 6 small plastic bottles with the Pepsi brand (SPB_PEPSI),
4 large plastic bottles with the Coke brand (LPB_COKE), 3 cigarette packets
with the camel brand (CP_CAMEL) and so forth). The litter count is conducted
at 983 sites around the country.

This data is a bit difficult to represent using SPSS's 'flat' structure. I
presume that the best way of doing so would be to have each case as a site
where litter was counted. Then, each brand within each litter type is a
separate variable.

For example:

SITE SPB_COKE SPB_PEPSI LPB_COKE LPB_PEPSI ... CP_CAMEL
1 11 6 4 0 3
2 9 4 0 2 0
3 0 1 0 5 2
...

The problem with this is that it will result in a potentially huge number of
variables, and it seems like there must be a better way of representing this
information. During the last litter count we recorded hundereds of brands.
Multiplying these by 83 different litter types results in a data array that
will be difficult to manage and work with.

Furthermore, when I am asked to provide totals by brand, I'll have to trawl
through the data file and find all variables with the brand 'COKE' appended
to the item type name, and create new variables such as 'Total_COKE' which
are the computed sum of all values in all of the 'COKE' variables (i.e.
Total_COKE = SPB_COKE + LPB_COKE and so forth).

I just want to know if there's a better way of representing this
information. As mentioned, I need to be able to provide totals, not just of
brands (e.g. total COKE regardless of item type) but also of item types
(e.g. total SPBs regardless of brand). There's got to be a simpler way of
storing this information.

I'd be very grateful if anyone could help me, or even let me know if there
ISN'T a better way so I will stop puzzling over it and just accept my fate!

Thanks very much.

Richard Ristow

Re: How to best represent data in SPSS

At 06:14 PM 5/4/2007, Ben wrote:

>I'm organising for a litter count where litter items are categorised
>into 83 types (e.g. small plastic bottles (SPB), large plastic bottles
>(LPB), cigarette packets (CP), etc). Furthermore, I need to record
>brands within each litter type and the number of items of each brand
>(e.g. 11 small plastic bottles with the Coke brand (SPB_COKE), 6 small
>plastic bottles with the Pepsi brand (SPB_PEPSI), 4 large plastic
>bottles with the Coke brand (LPB_COKE), 3 cigarette packets with the
>camel brand (CP_CAMEL) and so forth). The litter count is conducted at
>983 sites around the country.

So, you're studying litter in Australia? I could find you plenty here
in Providence, Rhode Island, if you need a comparison group.

>This data is a bit difficult to represent using SPSS's 'flat'
>structure. I presume that the best way of doing so would be to have
>each case as a site where litter was counted. Then, each brand within
>each litter type is a separate variable.
>
>For example:
>
>SITE SPB_COKE SPB_PEPSI LPB_COKE LPB_PEPSI ... CP_CAMEL
>1 11 6 4 0 3
>2 9 4 0 2 0
>3 0 1 0 5 2
>...
>
>The problem with this is that it will result in a potentially huge
>number of variables, and it seems like there must be a better way of
>representing this information. During the last litter count we
>recorded hundereds of brands. Multiplying these by 83 different litter
>types results in a data array that will be difficult to manage and
>work with.

What you're talking about is called 'wide' data organization, and
you've correctly listed its disadvantages. I recommend - many would
recommend - 'long' data organization, spreading over many records
rather than many variables. In your case, to represent what you've
given above would take four variables:

Site Litter_Cat Brand Count
1 SPB Coke 11
1 SPB Pepsi 6
1 LPB Coke 4
1 LPB Pepsi 0 <probably, no record>
...
1 CP Camel 3
2 SPB Coke 9

('Litter_Cat' is your litter categories, not to be confused with
kitty-litter.)

>Furthermore, when I am asked to provide totals by brand, I'll have to
>trawl through the data file and find all variables with the brand
>'COKE' appended to the item type name, and create new variables such
>as 'Total_COKE' which are the computed sum of all values in all of the
>'COKE' variables (i.e. Total_COKE = SPB_COKE + LPB_COKE and so forth).

And this would be a piece of cake with the structure I've outlined
above, and AGGREGATE.

>I'd be very grateful if anyone could help me, or even let me know if
>there ISN'T a better way so I will stop puzzling over it and just
>accept my fate!

I think this should do you just fine. It'll give you a huge explosion
in number of cases; but SPSS handles a great many cases without
trouble.

Go for it!
Richard

Ben-142

Re: How to best represent data in SPSS

In reply to this post by Ben-142

On Sat, 5 May 2007 02:34:03 -0400, Richard Ristow <[hidden email]>
wrote:

>At 06:14 PM 5/4/2007, Ben wrote:
>
>>I'm organising for a litter count where litter items are categorised
>>into 83 types (e.g. small plastic bottles (SPB), large plastic bottles
>>(LPB), cigarette packets (CP), etc). Furthermore, I need to record
>>brands within each litter type and the number of items of each brand
>>(e.g. 11 small plastic bottles with the Coke brand (SPB_COKE), 6 small
>>plastic bottles with the Pepsi brand (SPB_PEPSI), 4 large plastic
>>bottles with the Coke brand (LPB_COKE), 3 cigarette packets with the
>>camel brand (CP_CAMEL) and so forth). The litter count is conducted at
>>983 sites around the country.
>
>So, you're studying litter in Australia? I could find you plenty here
>in Providence, Rhode Island, if you need a comparison group.
>
>
>>This data is a bit difficult to represent using SPSS's 'flat'
>>structure. I presume that the best way of doing so would be to have
>>each case as a site where litter was counted. Then, each brand within
>>each litter type is a separate variable.
>>
>>For example:
>>
>>SITE SPB_COKE SPB_PEPSI LPB_COKE LPB_PEPSI ... CP_CAMEL
>>1 11 6 4 0 3
>>2 9 4 0 2 0
>>3 0 1 0 5 2
>>...
>>
>>The problem with this is that it will result in a potentially huge
>>number of variables, and it seems like there must be a better way of
>>representing this information. During the last litter count we
>>recorded hundereds of brands. Multiplying these by 83 different litter
>>types results in a data array that will be difficult to manage and
>>work with.
>
>What you're talking about is called 'wide' data organization, and
>you've correctly listed its disadvantages. I recommend - many would
>recommend - 'long' data organization, spreading over many records
>rather than many variables. In your case, to represent what you've
>given above would take four variables:
>
>Site Litter_Cat Brand Count
> 1 SPB Coke 11
> 1 SPB Pepsi 6
> 1 LPB Coke 4
> 1 LPB Pepsi 0 <probably, no record>
>...
> 1 CP Camel 3
> 2 SPB Coke 9
>
>('Litter_Cat' is your litter categories, not to be confused with
>kitty-litter.)
>
>>Furthermore, when I am asked to provide totals by brand, I'll have to
>>trawl through the data file and find all variables with the brand
>>'COKE' appended to the item type name, and create new variables such
>>as 'Total_COKE' which are the computed sum of all values in all of the
>>'COKE' variables (i.e. Total_COKE = SPB_COKE + LPB_COKE and so forth).
>
>And this would be a piece of cake with the structure I've outlined
>above, and AGGREGATE.
>
>>I'd be very grateful if anyone could help me, or even let me know if
>>there ISN'T a better way so I will stop puzzling over it and just
>>accept my fate!
>
>I think this should do you just fine. It'll give you a huge explosion
>in number of cases; but SPSS handles a great many cases without
>trouble.
>
>Go for it!
>Richard

Richard, that's great. So simple! I'm kicking myself for not thinking of it.
In fact I've been playing around with this sort of data, and found that if I
weight cases by 'Count', frequency tables and crosstabulation (e.g. brand by
litter type) provide pretty much all the information I need.

Thanks so much.
Ben.