Reading in Hierarchical Data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading in Hierarchical Data

James Wilson-24
Greetings All:

 

This is my first attempt at using a listserve, so please forgive any
gaffes.

 

The issue:        

 

As part of a research project, I was provided with an SPSS data file
with 15 variables, all are numeric except one date formatted variable
(mm/dd/yyyy). The data file is what I would refer to as hierarchical.
There are approximately 800 cases, but close to 7000 records. Any
individual case may have between 1 and 85 records. Each record is not a
'level' per se, but rather refers to individual events. The file is
sorted by a unique 8-number identifier associated with each case, and
sorted within cases by the event date (earliest to latest). There is no
record number associated with the individual records. Ideally, I would
like to convert the file to a rectangular file.

 

Does anyone have any suggestions?

 

Best

 

Jim

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading in Hierarchical Data

Hector Maletta
James,
For a beginner your question is pretty well formulated, though some
information is missing for a complete response.
Your file, in fact, is already a rectangular file: each row is an event and
each column is a variable or attribute of those events. When you say you
want it to be a rectangular file, you probably mean that you want a file
with one row per individual (i.e. 800 rows) instead of one row per event
(7000 rows).
If that is so, you may in turn mean two or three different things:
1. The 800-individual file may contain all the events of each individual,
side by side. If each event is characterized by, say, K variables, you would
have to reserve 85K columns for each individual (besides the individual's
ID). Some individuals may have data in only one event, others in only two,
and some in up to 85 events. It would be a very unwieldy file, indeed.
Frankly, I do not see the usefulness of this arrangement, although it may be
useful for some specific purpose.
2. You may obtain SUMMARY MEASURES of all events pertaining to each
individual. For instance: number of events for each individual, time elapsed
from first to last events happened to each individual, average value (for
each individual) of some variable characterizing the events, and so on. This
can be done quite easily with the AGGREGATE command, using individual ID as
the break variable, and defining the summary variables or aggregate
variables that you desire.
3. You do not tell anything about it, but you may have also information
ABOUT INDIVIDUALS (age, sex, occupation and so on) besides information about
the events. This information is relevant for all events pertaining to a
certain individual. If you have that information, where is it? Is it in a
special record for the individual, or it is repeated for every particular
event pertaining to each individual? In the first alternative, you may want
to select out the records of individuals, as a separate file, and them merge
them with the rest, assigning the information about each individual to all
events pertaining to each individual, so you can analyze relationships
between individual properties and event properties (do older individuals
have more severe events than the young? Do women have more than men?). This
requires a combination of SELECT IF, SAVE AND MATCH FILES/TABLE commands, on
which detailed advice could be given if needed.
Hope this helps you to clarify exactly what you want.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
James Wilson
Sent: 29 October 2008 12:02
To: [hidden email]
Subject: Reading in Hierarchical Data

Greetings All:



This is my first attempt at using a listserve, so please forgive any
gaffes.



The issue:



As part of a research project, I was provided with an SPSS data file
with 15 variables, all are numeric except one date formatted variable
(mm/dd/yyyy). The data file is what I would refer to as hierarchical.
There are approximately 800 cases, but close to 7000 records. Any
individual case may have between 1 and 85 records. Each record is not a
'level' per se, but rather refers to individual events. The file is
sorted by a unique 8-number identifier associated with each case, and
sorted within cases by the event date (earliest to latest). There is no
record number associated with the individual records. Ideally, I would
like to convert the file to a rectangular file.



Does anyone have any suggestions?



Best



Jim


To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Reading in Hierarchical Data

Richard Ristow
In reply to this post by James Wilson-24
At 10:01 AM 10/29/2008, James Wilson wrote:

>[I have] an SPSS data file [that] I would refer to as hierarchical.
>There are approximately 800 cases, but close to 7000 records. Any
>individual case may have between 1 and 85 records. Each record is
>not a 'level' per se, but rather refers to individual events. The
>file is sorted by a unique 8-number identifier associated with each
>case, and sorted within cases by the event date (earliest to latest).

See Hector Maletta's excellent remarks.

It is very common to have situations like this: many individuals,
each of whom has multiple events recorded, the number varying from
individual to individual. A classic case is medical records, with the
history of office visits for each patient. The term 'hierarchical' is
accurate, though we don't seem to be using it often.

>[In this file], each record is not a 'level' per se, but rather
>refers to individual events.

That is commonly called 'long' data organization. The alternative is
one record per individual, with multiple groups of variables, one
group for each event; that is called 'wide' organization.

You write,

>I would like to convert the file to a rectangular file.

As Hector wrote, it's not clear what you mean. But you may mean, you
want to convert your file from long to wide organization. If so,

* The SPSS command CASESTOVARS does precisely that. See the Command
Syntax Reference; or, from the menus, select
Data > Restructure > Restructure selected cases into variables

* However, it may not be a good idea. Many SPSS analyses, and most
data manipulations, are easier in a long-form file. Notably, it's far
easier to calculate individual summary statistics from a long-form
file. The main SPSS command for this is AGGREGATE; or, from the menus,
Data > Aggregate...
(AGGREGATE is one command for which writing syntax is usually easier
than using the menus.)

Good luck! And, as Hector already invited, post further with
follow-up questions.

-Richard Ristow

A side issue:

>[The file has] 15 variables, all are numeric except one date
>formatted variable (mm/dd/yyyy).

*Probably* that is an SPSS date variable. If so, it has no inherent
format. If so. It can be displayed as 'mm/dd/yyy', and it sounds like
its format has been specified so it is. But it can also be displayed
in any of the other formats in which SPSS can display dates, without
changing its underlying values.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD