complex merge question

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

complex merge question

bgreen
Hello,

I was hoping for some advice regarding if and how the following can be
performed in SPSS.

I have been provided two data files in EXCEL format (originally the data
was in ACCESS). File A contains variables such as 'DOB', 'court',
'hearing_date', ''offence code', sentence' 'outcome' pertaining to police
contacts. File B contains mental health presentations and includes 'DOB',
'presentation' 'outcome'. Id number and DOB are common to both files,
though sometimes there is date of birth discrepancies.

Where things are complicated is that for any individual, each police
contact is recorded on a separate row and where different outcomes resulted
there may be more than one row per contact. Similarly for mental health
presentation, there are multiple presentations which sometimes have
different outcomes recorded - so there would be two rows of data.

Merging two files with multiple rows per individual (but not necessarily
equal numbers of rows per file) seemed a complex task. For example, one
individual may have 2 incidents of police contact (=2 rows of records in
file A) but another person 15; similarly for file B, there is not one row
per individual but varying numbers of rows for presentation.  I am unsure
how to proceed with this. One option seemed to be to transform the data so
that each individual only had one row of data (which potentially raises
other problems by significantly increasing the numbers of variables) or
performing a merge preserving the original structure of each file, and
having Id's align


Any assistance regarding how this task might be performed is appreciated. I
believe I can access versions 11 & 12, possibly 13.

regards

Bob
Reply | Threaded
Open this post in threaded view
|

A Word utility to ease writing of syntax

VJ-6
A Word utility to ease writing of syntax for labels, recoding, and
formatting text output.  Useful for creating syntax for variable and value
labels, an otherwise painful task.

    a.. Download SPSS coding assistant http://vgupta.com/z_spss_tools.doc

  Choose the option "Save" and not "Run".  After saving the file, open it
with Microsoft Word.  You may have to lower the "Macro Security" protection
if you get an error message.
Reply | Threaded
Open this post in threaded view
|

Re: A Word utility to ease writing of syntax

Hal 9000
Thanks VJ--I look forward to hearing back from the other SPSS'ers. I'm
not brave enough to open a macro that somebody else wrote! (They could
be a SAS person, you know).
-Gary

On 11/3/06, VJ <[hidden email]> wrote:

> A Word utility to ease writing of syntax for labels, recoding, and
> formatting text output.  Useful for creating syntax for variable and value
> labels, an otherwise painful task.
>
>    a.. Download SPSS coding assistant http://vgupta.com/z_spss_tools.doc
>
>  Choose the option "Save" and not "Run".  After saving the file, open it
> with Microsoft Word.  You may have to lower the "Macro Security" protection
> if you get an error message.
>
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

Richard Ristow
In reply to this post by bgreen
At 05:57 PM 11/3/2006, Bob Green wrote:

>I have two data files. File A contains variables pertaining to police
>contacts. File B contains mental health presentations. Id number and
>DOB are common to both files, though sometimes there is date of birth
>discrepancies.
>FWhere things are complicated is that for any individual, each police
>
>Each police contact is recorded on a separate row and there may [even]
>be more than one row per contact. Similarly for mental health
>presentation, there are multiple presentations which sometimes have
>different outcomes recorded - so there would be two rows of data.
>
>Merging two files with multiple rows per individual (but not
>necessarily equal numbers of rows per file) seemed a complex task.

It is a complex task, but the first question is, what kind of merge do
you want?

If each mental health presentation is associated with a specific police
record, then you merge on that basis, and it's not too bad.

You could do a many-to-many merge, merging EVERY mental-health
presentation with EVERY police record. Somebody with 5 police records
and 3 mental-health presentations would then have 15 records. If that's
what you want, it can be done - it's tricky but doable in SPSS. But it
doesn't sound right; among other things, it can give different people
radically different numbers of records for reasons that aren't well
tied to what happened to them.

What questions are you trying to address? That will help a lot,
thinking about how you want your data structured.

-Wishing you best of luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

bgreen
Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague?  Seeing a
sample of the data may give a better idea of what it looks like.  I think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

statisticsdoc
Stephen Brand
www.statisticsdoc.com


Bob,

This is an interesting dataset that presents many possibilities.  Are you
particularly interested in the first admission as a watershed event?  One
thought is that you could, in broad strokes:

1.) utilize the mental health record file to determine the date of first
admission, resulting in one mental health record per  case;
2.) then do a one to many match between the date of first admission file and
the police records,
3.) then code the police records as occuring before or after the first
admission,
4.) then aggregate pre-admission and post-admission records separately with
regard to a severity indicator.

Of course, I do not want to presume that this would address your question.
Are you interested in investigating the idea that mental health admissions
precede a worsening of criminal activity among individuals who have a prior
arrest history?  There is also the possibility that admissions might, in
some contexts, precede a decrease in certain types of criminal activity.
Along these lines, some researchers working with de-institionalized patients
have suggested that the general population of people who are hospitalized
for mental illness are, compared with the general population, more likely to
be victims than perpertators of serious crimes, although it is not clear how
well this statement applies to individuals who have prior arrest histories.

HTH,

Stephen Brand


For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Bob Green
Sent: Saturday, November 04, 2006 6:33 PM
To: [hidden email]
Subject: Re: complex merge question


Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague?  Seeing a
sample of the data may give a better idea of what it looks like.  I think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

Richard Ristow
In reply to this post by bgreen
At 06:32 PM 11/4/2006, Bob Green wrote:

>The basic question of interest is whether a mental health admission
>makes any difference to subsequent police contact. The idea at this
>stage is to have a series of variables relating to seriousness of
>offence pre and post admission. Is this too vague?

It's enough to get going with. The crucial things you say are
"SUBSEQUENT police contact" and "PRE AND POST admission" (emphasis
added). That is, you're studying a single time series (for each
subject), in which events of both types may occur.

So you don't want to merge. You want to interleave the two kinds of
record, in chronological order.

Then, you can use LAG or, more likely, variables with LEAVE specified,
to carry over information, so for each police contact you have
information on for prior mental health can tell, for each police
contact, whatever you need about prior mental health presentations --
whether there have been any; how many; what was done; how long before
the police contact. You can also have information about police contacts
prior to the current one.

So, interleaving is your data organization. For what you do afterwards,
think what you intend. In particular, look at the advice and questions
that Stephen Brand has posted.

I think this is a step farther than you were: think of it as building a
single time series of the different events, rather than as a merge.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

bgreen
In reply to this post by statisticsdoc
Richard & Stephen,

Basically Stephen's points 1-4 are what I have in mind. Given there are
over 16,000 records I am trying to find a way to more efficiently perform
the task.

Richard, interleaving is not a term I am familiar with. From your e-mail it
seems you are suggesting the steps suggested by Stephen can be performed
using command such as LAG or LEAVE. Is this correct?

regards

Bob


At 09:08 PM 4/11/2006 -0500, Statisticsdoc wrote:

>Stephen Brand
>www.statisticsdoc.com
>
>
>Bob,
>
>This is an interesting dataset that presents many possibilities.  Are you
>particularly interested in the first admission as a watershed event?  One
>thought is that you could, in broad strokes:
>
>1.) utilize the mental health record file to determine the date of first
>admission, resulting in one mental health record per  case;
>2.) then do a one to many match between the date of first admission file and
>the police records,
>3.) then code the police records as occuring before or after the first
>admission,
>4.) then aggregate pre-admission and post-admission records separately with
>regard to a severity indicator.
>
>Of course, I do not want to presume that this would address your question.
>Are you interested in investigating the idea that mental health admissions
>precede a worsening of criminal activity among individuals who have a prior
>arrest history?  There is also the possibility that admissions might, in
>some contexts, precede a decrease in certain types of criminal activity.
>Along these lines, some researchers working with de-institionalized patients
>have suggested that the general population of people who are hospitalized
>for mental illness are, compared with the general population, more likely to
>be victims than perpertators of serious crimes, although it is not clear how
>well this statement applies to individuals who have prior arrest histories.
>
>HTH,
>
>Stephen Brand
>
>
>For personalized and professional consultation in statistics and research
>design, visit
>www.statisticsdoc.com
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
>Bob Green
>Sent: Saturday, November 04, 2006 6:33 PM
>To: [hidden email]
>Subject: Re: complex merge question
>
>
>Richard,
>
>Thanks for your reply.
>
>The basic question of interest is whether a mental health admission makes
>any difference to subsequent police contact. Once the data is formatted,
>the idea at this stage is to have a series of variables relating to
>seriousness of offence pre and post admission. Is this too vague?  Seeing a
>sample of the data may give a better idea of what it looks like.  I think
>ultimately, we will have some summary variables - offence against person
>-pre and post etc, but it will be easier to see what is going on if the
>data was all in one file
>
>regards
>
>Bob
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

Dennis Deck
In reply to this post by bgreen
I'm currently working on a problem that seems of have some similar
elements:  examine the relationship of a) Medicaid coverage on
utilization of opiate treatment and b) time in opiate treatment on
felony arrests.

For both models the independent and dependent variables vary over time
in complex ways if I follow them for 3 years.  Thus unless I treat this
as a longitudinal analysis with repeated monthly observations, it is
hard to appropriately model the effects.

I decided that the best approach was to do a multilevel analysis:

 Data structure:
        Create table of up to 36 monthly dummy records for each person
          Convert periods of Medicaid eligibility into monthly records
          Ditto for substance abuse treatment and arrest files
        Use MATCH FILES to do table lookups and merge the service and
           outcome data with the master file by client and month.

 Level 1 (Time):   36 monthly observations from an initial admission
with:
        Time (number of months from initial admission)
          Medicaid eligibility that month
          Treatment status that month
          Arrest status that month

 Level 2 (Person):  Person level covariates
          Prior history of treatment and arrest
          Propensity scores for Medicaid coverage and treatment
placement

While this is not how you have framed your question, you may want to
consider a more longitudinal alternative.  Most mental health conditions
are chronic (unless you are restricting attention to first admissions
for low severity cases).  Treatment is typically provided over a long
period of time but clients may be in and out of treatment.  There is
often a prior history of treatment.  While most studies of MH and SA
treatment still tend to focus on target episodes, there is growing
interest in a more longitudinal perspective.

Reframing the question changes how the file should be structured.

Dennis Deck, PhD
RMC Research Corporation
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Bob Green
Sent: Saturday, November 04, 2006 6:33 PM
To: [hidden email]
Subject: Re: complex merge question


Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission
makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague?
Seeing a
sample of the data may give a better idea of what it looks like.  I
think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

Richard Ristow
In reply to this post by bgreen
(And apologies for not responding yesterday. I was traveling, or tired
after traveling, the whole day.)

At 02:26 AM 11/5/2006, Bob Green wrote:

>Richard, interleaving is not a term I am familiar with.

It means putting the records in the same file, ordered by person and
date (hence 'interleaving'). See "BY Subcommand" in article "ADD FILES"
in the Command Syntax Reference.

>Basically Stephen's points 1-4 are what I have in mind. From your
>e-mail it seems the steps suggested by Stephen can be performed using
>command such as LAG or LEAVE. Is this correct?

Yes; in fact, a larger range of operations can be performed that way.
That's not to sell AGGREGATE short, however. It's very powerful and
very flexible, and you'll almost certainly use it, as well. Stephen's
suggestions are a special case that don't need interleaving.

Code below is untested, except as noted. It all assumes and requires,
. Files Police and MntlHlth contain variables Person and Date, and have
no other variables in common
. Both files are sorted by Person and Date.
. Variables the code creates do not conflict with any variables in
either file.

>>1.) utilize the mental health record file to determine the date of
>>first admission, resulting in one mental health record per case;

As Stephen wrote, this gives "one [summary] mental health record per
[person]: the person's first record. That has, inherently, two steps:
a.) Derive the single summary record from the mental-health file
b.) Attach that record to all police records for the person, using a
MATCH FILES in which the summary mental-health file is a TABLE.

Two ways to do the summary:
GET FILE = MmtlHlth.
AGGREGATE OUTFILE=I
    /BREAK=Person
    /MH_Frst 'Date of first mental-health record'
          =MIN(DATE).

The above produces a file with only two variables: Person and MH_Frst.
The following retains all variables from the first mental-health
record, and may be preferable:

ADD FILES
    /FILE=MntlHlth /RENAME=(DATE=MH_Frst)
    /FIRST=Visit1.
SELECT IF (Visit1 = 1).
VAR LABEL MH_Frst 'Date of first mental-health record'.

>>2.) then do a one to many match between the date of first admission
>>file and the police records,

Like this, with file produced by either of the above two methods:

MATCH FILES
    /TABLE=*
    /FILE=Police
    /BY PERSON.

>>3.) then code the police records as occurring before or after the
>>first admission,
>>4.) then aggregate pre-admission and post-admission records
>>separately with
>>regard to a severity indicator.

You can do a lot more with interleave logic. I'll give one example
only, which uses interleave, then AGGREGATE, to summarize mental-health
records between successive police encounters. It's tested. This is the
syntax; run output, with test data, is at the end.

*  This is the fundamental interleave operation. If mental-    .
*  health evaluation and police encounter are on the same day, .
*  it puts mental-health evaluation first. (Recommended.)      .

ADD FILES
    /File=MntlHlth
    /File=Police   /IN=Cop_Recd
    /BY Person Date.
. /**/ LIST.

*  Here's LEAVE: attach to each record the sequence number of  .
*  the current, or next following, police encounter.           .

NUMERIC   Encounter (F3).
VAR LABEL Encounter
     'Sequential #, next later police encounter'.
LEAVE     Encounter.

DO IF    MISSING(LAG(Person)).
*  First encounter for first person:       .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Person) NE Person.
*  First encounter for later persons:      .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Cop_Recd) EQ 1.
*  After a police encounter record, start  .
*  numbering for a new encounter.          .
.  COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.

*  ............ Example:   ................................... .
*  Summarize mental-health records between successive police   .
*  encounters.                                                 .
*  (Many other summaries are possible, including all variables .
*  totals or means of variables from the mental-health records.).


*  Police records are dropped before AGGREGATE, so save them   .
*  in a scratch file.                                          .
*  (In SPSS 14/15, save in a dataset, but logic is more        .
*  complicated.)                                               .
*  "Police#" may well be a file handle.                        .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
    /DROP=Cop_Recd /* Would always be 1 in this file */
    /DROP=Datum    /* Drop all vbls specific to mental health */.


*  For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

*  Summary of mental-health records between encounters:        .
AGGREGATE OUTFILE=*
    /BREAK =  Person Encounter
    /N_Eval   '# of mental health evaluations since last encounter'
           =  N
    /LtstMhDt 'Date of latest evaluation preceding encounter'
           =  LAST(Date)
    /LtstDatm 'Datum on most recent mental-health encounter'
           =  LAST(Datum)
    /FrstMhDt 'Date of first evaluation since previous encounter'
           =  FIRST(Date)
    /FrstDatm 'Datum on first evaluation since previous encounter'
           =  FIRST(Datum).
. /*-- LIST.

*  Re-attach police records, and list:                         .
MATCH FILES
    /FILE=Police#
    /FILE=*
    /BY =  Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.

..........................
Appendix I:   Draft output
* ...................................................          .

* .....  Test data: Police        ..... .
GET FILE=Police.
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
   \SPSS2006-11-05 Green - complex merge question - Police.SAV

Person        Date Cop_Data

     1  06-FEB-2006 Z
     1  02-MAR-2006 Y
     2  01-MAY-2006 X
     3  13-MAY-2006 W
     3  12-JUL-2006 V

Number of cases read:  5    Number of cases listed:  5


* .....  Test data: Mental Health ..... .

GET FIle=MntlHlth.
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
   \SPSS2006-11-05 Green - complex merge question - MntlHlth.SAV

Person        Date Datum

     1  13-JAN-2006 Alpha
     1  25-JAN-2006 Beta
     1  18-FEB-2006 Delta
     2  14-MAR-2006 Zeta
     2  26-MAR-2006 Eta
     2  07-APR-2006 Theta
     2  19-APR-2006 Iota
     3  25-MAY-2006 Mu
     3  06-JUN-2006 Nu
     3  18-JUN-2006 Xi
     3  30-JUN-2006 Omikron
     3  24-JUL-2006 Rho

Number of cases read:  12    Number of cases listed:  12


*  This is the fundamental interleave operation. If mental-    .
*  health evaluation and police encounter are on the same day, .
*  it puts mental-health evaluation first. (Recommended.)      .

ADD FILES
    /File=MntlHlth
    /File=Police   /IN=Cop_Recd
    /BY Person Date.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|

Person        Date Datum    Cop_Data Cop_Recd

     1  13-JAN-2006 Alpha                 0
     1  25-JAN-2006 Beta                  0
     1  06-FEB-2006          Z            1
     1  18-FEB-2006 Delta                 0
     1  02-MAR-2006          Y            1
     2  14-MAR-2006 Zeta                  0
     2  26-MAR-2006 Eta                   0
     2  07-APR-2006 Theta                 0
     2  19-APR-2006 Iota                  0
     2  01-MAY-2006          X            1
     3  13-MAY-2006          W            1
     3  25-MAY-2006 Mu                    0
     3  06-JUN-2006 Nu                    0
     3  18-JUN-2006 Xi                    0
     3  30-JUN-2006 Omikron               0
     3  12-JUL-2006          V            1
     3  24-JUL-2006 Rho                   0

Number of cases read:  17    Number of cases listed:  17


*  Here's LEAVE: attach to each record the sequence number of  .
*  the current, or next following, police encounter.           .

NUMERIC   Encounter (F3).
VAR LABEL Encounter
     'Sequential #, next later police encounter'.
LEAVE     Encounter.

DO IF    MISSING(LAG(Person)).
*  First encounter for first person:       .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Person) NE Person.
*  First encounter for later persons:      .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Cop_Recd) EQ 1.
*  After a police encounter record, start  .
*  numbering for a new encounter.          .
.  COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
Person        Date Datum    Cop_Data Cop_Recd Encounter

     1  13-JAN-2006 Alpha                 0         1
     1  25-JAN-2006 Beta                  0         1
     1  06-FEB-2006          Z            1         1
     1  18-FEB-2006 Delta                 0         2
     1  02-MAR-2006          Y            1         2
     2  14-MAR-2006 Zeta                  0         1
     2  26-MAR-2006 Eta                   0         1
     2  07-APR-2006 Theta                 0         1
     2  19-APR-2006 Iota                  0         1
     2  01-MAY-2006          X            1         1
     3  13-MAY-2006          W            1         1
     3  25-MAY-2006 Mu                    0         2
     3  06-JUN-2006 Nu                    0         2
     3  18-JUN-2006 Xi                    0         2
     3  30-JUN-2006 Omikron               0         2
     3  12-JUL-2006          V            1         2
     3  24-JUL-2006 Rho                   0         3

Number of cases read:  17    Number of cases listed:  17


*  ............ Example:   ................................... .
*  Summarize mental-health records between successive police   .
*  encounters.                                                 .
*  (Many other summaries are possible, including all variables .
*  totals or means of variables from the mental-health records.).


*  Police records are dropped before AGGREGATE, so save them   .
*  in a scratch file.                                          .
*  (In SPSS 14/15, save in a dataset, but logic is more        .
*  complicated.)                                               .
*  "Police#" may well be a file handle.                        .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
    /DROP=Cop_Recd /* Would always be 1 in this file */
    /DROP=Datum    /* Drop all vbls specific to mental health */.


*  For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

*  Summary of mental-health records between encounters:        .
AGGREGATE OUTFILE=*
    /BREAK =  Person Encounter
    /N_Eval   '# of mental health evaluations since last encounter'
           =  N
    /LtstMhDt 'Date of latest evaluation preceding encounter'
           =  LAST(Date)
    /LtstDatm 'Datum on most recent mental-health encounter'
           =  LAST(Datum)
    /FrstMhDt 'Date of first evaluation since previous encounter'
           =  FIRST(Date)
    /FrstDatm 'Datum on first evaluation since previous encounter'
           =  FIRST(Datum).
. /*-- LIST.

*  Re-attach police records, and list:                         .
MATCH FILES
    /FILE=Police#
    /FILE=*
    /BY =  Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:14       |
|-----------------------------|---------------------------|
Per                      Encou
son        Date Cop_Data nter   N_Eval    LtstMhDt LtstDatm    FrstMhDt
FrstDatm

   1 06-FEB-2006 Z           1        2 25-JAN-2006
Beta     13-JAN-2006 Alpha
   1 02-MAR-2006 Y           2        1 18-FEB-2006
Delta    18-FEB-2006 Delta
   2 01-MAY-2006 X           1        4 19-APR-2006
Iota     14-MAR-2006 Zeta
   3 13-MAY-2006
W           1        0           .                    .
   3 12-JUL-2006 V           2        4 30-JUN-2006
Omikron  25-MAY-2006 Mu
   3           .             3        1 24-JUL-2006
Rho      24-JUL-2006 Rho

Number of cases read:  6    Number of cases listed:  6
* ............................................................ .

..........................
Appendix II:  Test data
*  Test data; relatively complicated   .
NEW FILE.
DATA LIST LIST /
   Person(N3) Cop_Recd(F2) Date(ADATE12)  Datum(A8).
BEGIN DATA
001     0       01/01/2006  Alpha
001     0       01/01/2006  Beta
001     1       01/01/2006  Z
001     0       01/01/2006  Delta
001     1       01/01/2006  Y
002     0       01/01/2006  Zeta
002     0       01/01/2006  Eta
002     0       01/01/2006  Theta
002     0       01/01/2006  Iota
002     1       01/01/2006  X
003     1       01/01/2006  W
003     0       01/01/2006  Mu
003     0       01/01/2006  Nu
003     0       01/01/2006  Xi
003     0       01/01/2006  Omikron
003     1       01/01/2006  V
003     0       01/01/2006  Rho
END DATA.
COMPUTE DATE=DATE+TIME.DAYS($CASENUM*12).
FORMATS DATE (DATE11).
. /*-- LIST.
TEMPORARY.
SELECT IF COP_RECD EQ 1.
SAVE OUTFILE=Police
   /RENAME = (Datum = Cop_Data)
   /DROP   = COP_RECD.

TEMPORARY.
SELECT IF COP_RECD EQ 0.
SAVE OUTFILE=MntlHlth
   /DROP   = COP_RECD.
Reply | Threaded
Open this post in threaded view
|

Re: complex merge question

statisticsdoc
Richard,

Well done!

Stephen Brand

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com


-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, November 06, 2006 5:26 PM
To: Bob Green; [hidden email]
Cc: [hidden email]
Subject: Re: complex merge question


(And apologies for not responding yesterday. I was traveling, or tired
after traveling, the whole day.)

At 02:26 AM 11/5/2006, Bob Green wrote:

>Richard, interleaving is not a term I am familiar with.

It means putting the records in the same file, ordered by person and
date (hence 'interleaving'). See "BY Subcommand" in article "ADD FILES"
in the Command Syntax Reference.

>Basically Stephen's points 1-4 are what I have in mind. From your
>e-mail it seems the steps suggested by Stephen can be performed using
>command such as LAG or LEAVE. Is this correct?

Yes; in fact, a larger range of operations can be performed that way.
That's not to sell AGGREGATE short, however. It's very powerful and
very flexible, and you'll almost certainly use it, as well. Stephen's
suggestions are a special case that don't need interleaving.

Code below is untested, except as noted. It all assumes and requires,
. Files Police and MntlHlth contain variables Person and Date, and have
no other variables in common
. Both files are sorted by Person and Date.
. Variables the code creates do not conflict with any variables in
either file.

>>1.) utilize the mental health record file to determine the date of
>>first admission, resulting in one mental health record per case;

As Stephen wrote, this gives "one [summary] mental health record per
[person]: the person's first record. That has, inherently, two steps:
a.) Derive the single summary record from the mental-health file
b.) Attach that record to all police records for the person, using a
MATCH FILES in which the summary mental-health file is a TABLE.

Two ways to do the summary:
GET FILE = MmtlHlth.
AGGREGATE OUTFILE=I
    /BREAK=Person
    /MH_Frst 'Date of first mental-health record'
          =MIN(DATE).

The above produces a file with only two variables: Person and MH_Frst.
The following retains all variables from the first mental-health
record, and may be preferable:

ADD FILES
    /FILE=MntlHlth /RENAME=(DATE=MH_Frst)
    /FIRST=Visit1.
SELECT IF (Visit1 = 1).
VAR LABEL MH_Frst 'Date of first mental-health record'.

>>2.) then do a one to many match between the date of first admission
>>file and the police records,

Like this, with file produced by either of the above two methods:

MATCH FILES
    /TABLE=*
    /FILE=Police
    /BY PERSON.

>>3.) then code the police records as occurring before or after the
>>first admission,
>>4.) then aggregate pre-admission and post-admission records
>>separately with
>>regard to a severity indicator.

You can do a lot more with interleave logic. I'll give one example
only, which uses interleave, then AGGREGATE, to summarize mental-health
records between successive police encounters. It's tested. This is the
syntax; run output, with test data, is at the end.

*  This is the fundamental interleave operation. If mental-    .
*  health evaluation and police encounter are on the same day, .
*  it puts mental-health evaluation first. (Recommended.)      .

ADD FILES
    /File=MntlHlth
    /File=Police   /IN=Cop_Recd
    /BY Person Date.
. /**/ LIST.

*  Here's LEAVE: attach to each record the sequence number of  .
*  the current, or next following, police encounter.           .

NUMERIC   Encounter (F3).
VAR LABEL Encounter
     'Sequential #, next later police encounter'.
LEAVE     Encounter.

DO IF    MISSING(LAG(Person)).
*  First encounter for first person:       .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Person) NE Person.
*  First encounter for later persons:      .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Cop_Recd) EQ 1.
*  After a police encounter record, start  .
*  numbering for a new encounter.          .
.  COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.

*  ............ Example:   ................................... .
*  Summarize mental-health records between successive police   .
*  encounters.                                                 .
*  (Many other summaries are possible, including all variables .
*  totals or means of variables from the mental-health records.).


*  Police records are dropped before AGGREGATE, so save them   .
*  in a scratch file.                                          .
*  (In SPSS 14/15, save in a dataset, but logic is more        .
*  complicated.)                                               .
*  "Police#" may well be a file handle.                        .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
    /DROP=Cop_Recd /* Would always be 1 in this file */
    /DROP=Datum    /* Drop all vbls specific to mental health */.


*  For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

*  Summary of mental-health records between encounters:        .
AGGREGATE OUTFILE=*
    /BREAK =  Person Encounter
    /N_Eval   '# of mental health evaluations since last encounter'
           =  N
    /LtstMhDt 'Date of latest evaluation preceding encounter'
           =  LAST(Date)
    /LtstDatm 'Datum on most recent mental-health encounter'
           =  LAST(Datum)
    /FrstMhDt 'Date of first evaluation since previous encounter'
           =  FIRST(Date)
    /FrstDatm 'Datum on first evaluation since previous encounter'
           =  FIRST(Datum).
. /*-- LIST.

*  Re-attach police records, and list:                         .
MATCH FILES
    /FILE=Police#
    /FILE=*
    /BY =  Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.

..........................
Appendix I:   Draft output
* ...................................................          .

* .....  Test data: Police        ..... .
GET FILE=Police.
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
   \SPSS2006-11-05 Green - complex merge question - Police.SAV

Person        Date Cop_Data

     1  06-FEB-2006 Z
     1  02-MAR-2006 Y
     2  01-MAY-2006 X
     3  13-MAY-2006 W
     3  12-JUL-2006 V

Number of cases read:  5    Number of cases listed:  5


* .....  Test data: Mental Health ..... .

GET FIle=MntlHlth.
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
   \SPSS2006-11-05 Green - complex merge question - MntlHlth.SAV

Person        Date Datum

     1  13-JAN-2006 Alpha
     1  25-JAN-2006 Beta
     1  18-FEB-2006 Delta
     2  14-MAR-2006 Zeta
     2  26-MAR-2006 Eta
     2  07-APR-2006 Theta
     2  19-APR-2006 Iota
     3  25-MAY-2006 Mu
     3  06-JUN-2006 Nu
     3  18-JUN-2006 Xi
     3  30-JUN-2006 Omikron
     3  24-JUL-2006 Rho

Number of cases read:  12    Number of cases listed:  12


*  This is the fundamental interleave operation. If mental-    .
*  health evaluation and police encounter are on the same day, .
*  it puts mental-health evaluation first. (Recommended.)      .

ADD FILES
    /File=MntlHlth
    /File=Police   /IN=Cop_Recd
    /BY Person Date.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|

Person        Date Datum    Cop_Data Cop_Recd

     1  13-JAN-2006 Alpha                 0
     1  25-JAN-2006 Beta                  0
     1  06-FEB-2006          Z            1
     1  18-FEB-2006 Delta                 0
     1  02-MAR-2006          Y            1
     2  14-MAR-2006 Zeta                  0
     2  26-MAR-2006 Eta                   0
     2  07-APR-2006 Theta                 0
     2  19-APR-2006 Iota                  0
     2  01-MAY-2006          X            1
     3  13-MAY-2006          W            1
     3  25-MAY-2006 Mu                    0
     3  06-JUN-2006 Nu                    0
     3  18-JUN-2006 Xi                    0
     3  30-JUN-2006 Omikron               0
     3  12-JUL-2006          V            1
     3  24-JUL-2006 Rho                   0

Number of cases read:  17    Number of cases listed:  17


*  Here's LEAVE: attach to each record the sequence number of  .
*  the current, or next following, police encounter.           .

NUMERIC   Encounter (F3).
VAR LABEL Encounter
     'Sequential #, next later police encounter'.
LEAVE     Encounter.

DO IF    MISSING(LAG(Person)).
*  First encounter for first person:       .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Person) NE Person.
*  First encounter for later persons:      .
.  COMPUTE Encounter = 1.
ELSE IF  LAG(Cop_Recd) EQ 1.
*  After a police encounter record, start  .
*  numbering for a new encounter.          .
.  COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:11       |
|-----------------------------|---------------------------|
Person        Date Datum    Cop_Data Cop_Recd Encounter

     1  13-JAN-2006 Alpha                 0         1
     1  25-JAN-2006 Beta                  0         1
     1  06-FEB-2006          Z            1         1
     1  18-FEB-2006 Delta                 0         2
     1  02-MAR-2006          Y            1         2
     2  14-MAR-2006 Zeta                  0         1
     2  26-MAR-2006 Eta                   0         1
     2  07-APR-2006 Theta                 0         1
     2  19-APR-2006 Iota                  0         1
     2  01-MAY-2006          X            1         1
     3  13-MAY-2006          W            1         1
     3  25-MAY-2006 Mu                    0         2
     3  06-JUN-2006 Nu                    0         2
     3  18-JUN-2006 Xi                    0         2
     3  30-JUN-2006 Omikron               0         2
     3  12-JUL-2006          V            1         2
     3  24-JUL-2006 Rho                   0         3

Number of cases read:  17    Number of cases listed:  17


*  ............ Example:   ................................... .
*  Summarize mental-health records between successive police   .
*  encounters.                                                 .
*  (Many other summaries are possible, including all variables .
*  totals or means of variables from the mental-health records.).


*  Police records are dropped before AGGREGATE, so save them   .
*  in a scratch file.                                          .
*  (In SPSS 14/15, save in a dataset, but logic is more        .
*  complicated.)                                               .
*  "Police#" may well be a file handle.                        .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
    /DROP=Cop_Recd /* Would always be 1 in this file */
    /DROP=Datum    /* Drop all vbls specific to mental health */.


*  For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

*  Summary of mental-health records between encounters:        .
AGGREGATE OUTFILE=*
    /BREAK =  Person Encounter
    /N_Eval   '# of mental health evaluations since last encounter'
           =  N
    /LtstMhDt 'Date of latest evaluation preceding encounter'
           =  LAST(Date)
    /LtstDatm 'Datum on most recent mental-health encounter'
           =  LAST(Datum)
    /FrstMhDt 'Date of first evaluation since previous encounter'
           =  FIRST(Date)
    /FrstDatm 'Datum on first evaluation since previous encounter'
           =  FIRST(Datum).
. /*-- LIST.

*  Re-attach police records, and list:                         .
MATCH FILES
    /FILE=Police#
    /FILE=*
    /BY =  Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.
|-----------------------------|---------------------------|
|Output Created               |06-NOV-2006 17:12:14       |
|-----------------------------|---------------------------|
Per                      Encou
son        Date Cop_Data nter   N_Eval    LtstMhDt LtstDatm    FrstMhDt
FrstDatm

   1 06-FEB-2006 Z           1        2 25-JAN-2006
Beta     13-JAN-2006 Alpha
   1 02-MAR-2006 Y           2        1 18-FEB-2006
Delta    18-FEB-2006 Delta
   2 01-MAY-2006 X           1        4 19-APR-2006
Iota     14-MAR-2006 Zeta
   3 13-MAY-2006
W           1        0           .                    .
   3 12-JUL-2006 V           2        4 30-JUN-2006
Omikron  25-MAY-2006 Mu
   3           .             3        1 24-JUL-2006
Rho      24-JUL-2006 Rho

Number of cases read:  6    Number of cases listed:  6
* ............................................................ .

..........................
Appendix II:  Test data
*  Test data; relatively complicated   .
NEW FILE.
DATA LIST LIST /
   Person(N3) Cop_Recd(F2) Date(ADATE12)  Datum(A8).
BEGIN DATA
001     0       01/01/2006  Alpha
001     0       01/01/2006  Beta
001     1       01/01/2006  Z
001     0       01/01/2006  Delta
001     1       01/01/2006  Y
002     0       01/01/2006  Zeta
002     0       01/01/2006  Eta
002     0       01/01/2006  Theta
002     0       01/01/2006  Iota
002     1       01/01/2006  X
003     1       01/01/2006  W
003     0       01/01/2006  Mu
003     0       01/01/2006  Nu
003     0       01/01/2006  Xi
003     0       01/01/2006  Omikron
003     1       01/01/2006  V
003     0       01/01/2006  Rho
END DATA.
COMPUTE DATE=DATE+TIME.DAYS($CASENUM*12).
FORMATS DATE (DATE11).
. /*-- LIST.
TEMPORARY.
SELECT IF COP_RECD EQ 1.
SAVE OUTFILE=Police
   /RENAME = (Datum = Cop_Data)
   /DROP   = COP_RECD.

TEMPORARY.
SELECT IF COP_RECD EQ 0.
SAVE OUTFILE=MntlHlth
   /DROP   = COP_RECD.