SPSSX Discussion

complex merge question

Classic

List

Threaded

11 messages Options

bgreen

complex merge question

Hello,

I was hoping for some advice regarding if and how the following can be
performed in SPSS.

I have been provided two data files in EXCEL format (originally the data
was in ACCESS). File A contains variables such as 'DOB', 'court',
'hearing_date', ''offence code', sentence' 'outcome' pertaining to police
contacts. File B contains mental health presentations and includes 'DOB',
'presentation' 'outcome'. Id number and DOB are common to both files,
though sometimes there is date of birth discrepancies.

Where things are complicated is that for any individual, each police
contact is recorded on a separate row and where different outcomes resulted
there may be more than one row per contact. Similarly for mental health
presentation, there are multiple presentations which sometimes have
different outcomes recorded - so there would be two rows of data.

Merging two files with multiple rows per individual (but not necessarily
equal numbers of rows per file) seemed a complex task. For example, one
individual may have 2 incidents of police contact (=2 rows of records in
file A) but another person 15; similarly for file B, there is not one row
per individual but varying numbers of rows for presentation. I am unsure
how to proceed with this. One option seemed to be to transform the data so
that each individual only had one row of data (which potentially raises
other problems by significantly increasing the numbers of variables) or
performing a merge preserving the original structure of each file, and
having Id's align

Any assistance regarding how this task might be performed is appreciated. I
believe I can access versions 11 & 12, possibly 13.

regards

Bob

VJ-6

A Word utility to ease writing of syntax

A Word utility to ease writing of syntax for labels, recoding, and
formatting text output. Useful for creating syntax for variable and value
labels, an otherwise painful task.

a.. Download SPSS coding assistant http://vgupta.com/z_spss_tools.doc

Choose the option "Save" and not "Run". After saving the file, open it
with Microsoft Word. You may have to lower the "Macro Security" protection
if you get an error message.

Hal 9000

Re: A Word utility to ease writing of syntax

Thanks VJ--I look forward to hearing back from the other SPSS'ers. I'm
not brave enough to open a macro that somebody else wrote! (They could
be a SAS person, you know).
-Gary

On 11/3/06, VJ <[hidden email]> wrote:

> A Word utility to ease writing of syntax for labels, recoding, and
> formatting text output. Useful for creating syntax for variable and value
> labels, an otherwise painful task.
>
> a.. Download SPSS coding assistant http://vgupta.com/z_spss_tools.doc
>
> Choose the option "Save" and not "Run". After saving the file, open it
> with Microsoft Word. You may have to lower the "Macro Security" protection
> if you get an error message.
>

Richard Ristow

Re: complex merge question

In reply to this post by bgreen

At 05:57 PM 11/3/2006, Bob Green wrote:

>I have two data files. File A contains variables pertaining to police
>contacts. File B contains mental health presentations. Id number and
>DOB are common to both files, though sometimes there is date of birth
>discrepancies.
>FWhere things are complicated is that for any individual, each police
>
>Each police contact is recorded on a separate row and there may [even]
>be more than one row per contact. Similarly for mental health
>presentation, there are multiple presentations which sometimes have
>different outcomes recorded - so there would be two rows of data.
>
>Merging two files with multiple rows per individual (but not
>necessarily equal numbers of rows per file) seemed a complex task.

It is a complex task, but the first question is, what kind of merge do
you want?

If each mental health presentation is associated with a specific police
record, then you merge on that basis, and it's not too bad.

You could do a many-to-many merge, merging EVERY mental-health
presentation with EVERY police record. Somebody with 5 police records
and 3 mental-health presentations would then have 15 records. If that's
what you want, it can be done - it's tricky but doable in SPSS. But it
doesn't sound right; among other things, it can give different people
radically different numbers of records for reasons that aren't well
tied to what happened to them.

What questions are you trying to address? That will help a lot,
thinking about how you want your data structured.

-Wishing you best of luck,
Richard

bgreen

Re: complex merge question

Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague? Seeing a
sample of the data may give a better idea of what it looks like. I think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob

statisticsdoc

Re: complex merge question

Stephen Brand
www.statisticsdoc.com

Bob,

This is an interesting dataset that presents many possibilities. Are you
particularly interested in the first admission as a watershed event? One
thought is that you could, in broad strokes:

1.) utilize the mental health record file to determine the date of first
admission, resulting in one mental health record per case;
2.) then do a one to many match between the date of first admission file and
the police records,
3.) then code the police records as occuring before or after the first
admission,
4.) then aggregate pre-admission and post-admission records separately with
regard to a severity indicator.

Of course, I do not want to presume that this would address your question.
Are you interested in investigating the idea that mental health admissions
precede a worsening of criminal activity among individuals who have a prior
arrest history? There is also the possibility that admissions might, in
some contexts, precede a decrease in certain types of criminal activity.
Along these lines, some researchers working with de-institionalized patients
have suggested that the general population of people who are hospitalized
for mental illness are, compared with the general population, more likely to
be victims than perpertators of serious crimes, although it is not clear how
well this statement applies to individuals who have prior arrest histories.

HTH,

Stephen Brand

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Bob Green
Sent: Saturday, November 04, 2006 6:33 PM
To: [hidden email]
Subject: Re: complex merge question

Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague? Seeing a
sample of the data may give a better idea of what it looks like. I think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob

Richard Ristow

Re: complex merge question

In reply to this post by bgreen

At 06:32 PM 11/4/2006, Bob Green wrote:

>The basic question of interest is whether a mental health admission
>makes any difference to subsequent police contact. The idea at this
>stage is to have a series of variables relating to seriousness of
>offence pre and post admission. Is this too vague?

It's enough to get going with. The crucial things you say are
"SUBSEQUENT police contact" and "PRE AND POST admission" (emphasis
added). That is, you're studying a single time series (for each
subject), in which events of both types may occur.

So you don't want to merge. You want to interleave the two kinds of
record, in chronological order.

Then, you can use LAG or, more likely, variables with LEAVE specified,
to carry over information, so for each police contact you have
information on for prior mental health can tell, for each police
contact, whatever you need about prior mental health presentations --
whether there have been any; how many; what was done; how long before
the police contact. You can also have information about police contacts
prior to the current one.

So, interleaving is your data organization. For what you do afterwards,
think what you intend. In particular, look at the advice and questions
that Stephen Brand has posted.

I think this is a step farther than you were: think of it as building a
single time series of the different events, rather than as a merge.

-Good luck,
Richard

bgreen

Re: complex merge question

In reply to this post by statisticsdoc

Richard & Stephen,

Basically Stephen's points 1-4 are what I have in mind. Given there are
over 16,000 records I am trying to find a way to more efficiently perform
the task.

Richard, interleaving is not a term I am familiar with. From your e-mail it
seems you are suggesting the steps suggested by Stephen can be performed
using command such as LAG or LEAVE. Is this correct?

regards

Bob

At 09:08 PM 4/11/2006 -0500, Statisticsdoc wrote:

>Stephen Brand
>www.statisticsdoc.com
>
>
>Bob,
>
>This is an interesting dataset that presents many possibilities. Are you
>particularly interested in the first admission as a watershed event? One
>thought is that you could, in broad strokes:
>
>1.) utilize the mental health record file to determine the date of first
>admission, resulting in one mental health record per case;
>2.) then do a one to many match between the date of first admission file and
>the police records,
>3.) then code the police records as occuring before or after the first
>admission,
>4.) then aggregate pre-admission and post-admission records separately with
>regard to a severity indicator.
>
>Of course, I do not want to presume that this would address your question.
>Are you interested in investigating the idea that mental health admissions
>precede a worsening of criminal activity among individuals who have a prior
>arrest history? There is also the possibility that admissions might, in
>some contexts, precede a decrease in certain types of criminal activity.
>Along these lines, some researchers working with de-institionalized patients
>have suggested that the general population of people who are hospitalized
>for mental illness are, compared with the general population, more likely to
>be victims than perpertators of serious crimes, although it is not clear how
>well this statement applies to individuals who have prior arrest histories.
>
>HTH,
>
>Stephen Brand
>
>
>For personalized and professional consultation in statistics and research
>design, visit
>www.statisticsdoc.com
>
>
>-----Original Message-----
>From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
>Bob Green
>Sent: Saturday, November 04, 2006 6:33 PM
>To: [hidden email]
>Subject: Re: complex merge question
>
>
>Richard,
>
>Thanks for your reply.
>
>The basic question of interest is whether a mental health admission makes
>any difference to subsequent police contact. Once the data is formatted,
>the idea at this stage is to have a series of variables relating to
>seriousness of offence pre and post admission. Is this too vague? Seeing a
>sample of the data may give a better idea of what it looks like. I think
>ultimately, we will have some summary variables - offence against person
>-pre and post etc, but it will be easier to see what is going on if the
>data was all in one file
>
>regards
>
>Bob

Dennis Deck

Re: complex merge question

In reply to this post by bgreen

I'm currently working on a problem that seems of have some similar
elements: examine the relationship of a) Medicaid coverage on
utilization of opiate treatment and b) time in opiate treatment on
felony arrests.

For both models the independent and dependent variables vary over time
in complex ways if I follow them for 3 years. Thus unless I treat this
as a longitudinal analysis with repeated monthly observations, it is
hard to appropriately model the effects.

I decided that the best approach was to do a multilevel analysis:

Data structure:
Create table of up to 36 monthly dummy records for each person
Convert periods of Medicaid eligibility into monthly records
Ditto for substance abuse treatment and arrest files
Use MATCH FILES to do table lookups and merge the service and
outcome data with the master file by client and month.

Level 1 (Time): 36 monthly observations from an initial admission
with:
Time (number of months from initial admission)
Medicaid eligibility that month
Treatment status that month
Arrest status that month

Level 2 (Person): Person level covariates
Prior history of treatment and arrest
Propensity scores for Medicaid coverage and treatment
placement

While this is not how you have framed your question, you may want to
consider a more longitudinal alternative. Most mental health conditions
are chronic (unless you are restricting attention to first admissions
for low severity cases). Treatment is typically provided over a long
period of time but clients may be in and out of treatment. There is
often a prior history of treatment. While most studies of MH and SA
treatment still tend to focus on target episodes, there is growing
interest in a more longitudinal perspective.

Reframing the question changes how the file should be structured.

Dennis Deck, PhD
RMC Research Corporation
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Bob Green
Sent: Saturday, November 04, 2006 6:33 PM
To: [hidden email]
Subject: Re: complex merge question

Richard,

Thanks for your reply.

The basic question of interest is whether a mental health admission
makes
any difference to subsequent police contact. Once the data is formatted,
the idea at this stage is to have a series of variables relating to
seriousness of offence pre and post admission. Is this too vague?
Seeing a
sample of the data may give a better idea of what it looks like. I
think
ultimately, we will have some summary variables - offence against person
-pre and post etc, but it will be easier to see what is going on if the
data was all in one file

regards

Bob

Richard Ristow

Re: complex merge question

In reply to this post by bgreen

(And apologies for not responding yesterday. I was traveling, or tired
after traveling, the whole day.)

At 02:26 AM 11/5/2006, Bob Green wrote:

>Richard, interleaving is not a term I am familiar with.

It means putting the records in the same file, ordered by person and
date (hence 'interleaving'). See "BY Subcommand" in article "ADD FILES"
in the Command Syntax Reference.

>Basically Stephen's points 1-4 are what I have in mind. From your
>e-mail it seems the steps suggested by Stephen can be performed using
>command such as LAG or LEAVE. Is this correct?

Yes; in fact, a larger range of operations can be performed that way.
That's not to sell AGGREGATE short, however. It's very powerful and
very flexible, and you'll almost certainly use it, as well. Stephen's
suggestions are a special case that don't need interleaving.

Code below is untested, except as noted. It all assumes and requires,
. Files Police and MntlHlth contain variables Person and Date, and have
no other variables in common
. Both files are sorted by Person and Date.
. Variables the code creates do not conflict with any variables in
either file.

>>1.) utilize the mental health record file to determine the date of
>>first admission, resulting in one mental health record per case;

As Stephen wrote, this gives "one [summary] mental health record per
[person]: the person's first record. That has, inherently, two steps:
a.) Derive the single summary record from the mental-health file
b.) Attach that record to all police records for the person, using a
MATCH FILES in which the summary mental-health file is a TABLE.

Two ways to do the summary:
GET FILE = MmtlHlth.
AGGREGATE OUTFILE=I
/BREAK=Person
/MH_Frst 'Date of first mental-health record'
=MIN(DATE).

The above produces a file with only two variables: Person and MH_Frst.
The following retains all variables from the first mental-health
record, and may be preferable:

ADD FILES
/FILE=MntlHlth /RENAME=(DATE=MH_Frst)
/FIRST=Visit1.
SELECT IF (Visit1 = 1).
VAR LABEL MH_Frst 'Date of first mental-health record'.

>>2.) then do a one to many match between the date of first admission
>>file and the police records,

Like this, with file produced by either of the above two methods:

MATCH FILES
/TABLE=*
/FILE=Police
/BY PERSON.

>>3.) then code the police records as occurring before or after the
>>first admission,
>>4.) then aggregate pre-admission and post-admission records
>>separately with
>>regard to a severity indicator.

You can do a lot more with interleave logic. I'll give one example
only, which uses interleave, then AGGREGATE, to summarize mental-health
records between successive police encounters. It's tested. This is the
syntax; run output, with test data, is at the end.

* This is the fundamental interleave operation. If mental- .
* health evaluation and police encounter are on the same day, .
* it puts mental-health evaluation first. (Recommended.) .

ADD FILES
/File=MntlHlth
/File=Police /IN=Cop_Recd
/BY Person Date.
. /**/ LIST.

* Here's LEAVE: attach to each record the sequence number of .
* the current, or next following, police encounter. .

NUMERIC Encounter (F3).
VAR LABEL Encounter
'Sequential #, next later police encounter'.
LEAVE Encounter.

DO IF MISSING(LAG(Person)).
* First encounter for first person: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Person) NE Person.
* First encounter for later persons: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Cop_Recd) EQ 1.
* After a police encounter record, start .
* numbering for a new encounter. .
. COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.

* ............ Example: ................................... .
* Summarize mental-health records between successive police .
* encounters. .
* (Many other summaries are possible, including all variables .
* totals or means of variables from the mental-health records.).

* Police records are dropped before AGGREGATE, so save them .
* in a scratch file. .
* (In SPSS 14/15, save in a dataset, but logic is more .
* complicated.) .
* "Police#" may well be a file handle. .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
/DROP=Cop_Recd /* Would always be 1 in this file */
/DROP=Datum /* Drop all vbls specific to mental health */.

* For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

* Summary of mental-health records between encounters: .
AGGREGATE OUTFILE=*
/BREAK = Person Encounter
/N_Eval '# of mental health evaluations since last encounter'
= N
/LtstMhDt 'Date of latest evaluation preceding encounter'
= LAST(Date)
/LtstDatm 'Datum on most recent mental-health encounter'
= LAST(Datum)
/FrstMhDt 'Date of first evaluation since previous encounter'
= FIRST(Date)
/FrstDatm 'Datum on first evaluation since previous encounter'
= FIRST(Datum).
. /*-- LIST.

* Re-attach police records, and list: .
MATCH FILES
/FILE=Police#
/FILE=*
/BY = Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.

..........................
Appendix I: Draft output
* ................................................... .

* ..... Test data: Police ..... .
GET FILE=Police.
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
\SPSS2006-11-05 Green - complex merge question - Police.SAV

Person Date Cop_Data

1 06-FEB-2006 Z
1 02-MAR-2006 Y
2 01-MAY-2006 X
3 13-MAY-2006 W
3 12-JUL-2006 V

Number of cases read: 5 Number of cases listed: 5

* ..... Test data: Mental Health ..... .

GET FIle=MntlHlth.
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
\SPSS2006-11-05 Green - complex merge question - MntlHlth.SAV

Person Date Datum

1 13-JAN-2006 Alpha
1 25-JAN-2006 Beta
1 18-FEB-2006 Delta
2 14-MAR-2006 Zeta
2 26-MAR-2006 Eta
2 07-APR-2006 Theta
2 19-APR-2006 Iota
3 25-MAY-2006 Mu
3 06-JUN-2006 Nu
3 18-JUN-2006 Xi
3 30-JUN-2006 Omikron
3 24-JUL-2006 Rho

Number of cases read: 12 Number of cases listed: 12

* This is the fundamental interleave operation. If mental- .
* health evaluation and police encounter are on the same day, .
* it puts mental-health evaluation first. (Recommended.) .

ADD FILES
/File=MntlHlth
/File=Police /IN=Cop_Recd
/BY Person Date.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|

Person Date Datum Cop_Data Cop_Recd

1 13-JAN-2006 Alpha 0
1 25-JAN-2006 Beta 0
1 06-FEB-2006 Z 1
1 18-FEB-2006 Delta 0
1 02-MAR-2006 Y 1
2 14-MAR-2006 Zeta 0
2 26-MAR-2006 Eta 0
2 07-APR-2006 Theta 0
2 19-APR-2006 Iota 0
2 01-MAY-2006 X 1
3 13-MAY-2006 W 1
3 25-MAY-2006 Mu 0
3 06-JUN-2006 Nu 0
3 18-JUN-2006 Xi 0
3 30-JUN-2006 Omikron 0
3 12-JUL-2006 V 1
3 24-JUL-2006 Rho 0

Number of cases read: 17 Number of cases listed: 17

* Here's LEAVE: attach to each record the sequence number of .
* the current, or next following, police encounter. .

NUMERIC Encounter (F3).
VAR LABEL Encounter
'Sequential #, next later police encounter'.
LEAVE Encounter.

DO IF MISSING(LAG(Person)).
* First encounter for first person: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Person) NE Person.
* First encounter for later persons: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Cop_Recd) EQ 1.
* After a police encounter record, start .
* numbering for a new encounter. .
. COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
Person Date Datum Cop_Data Cop_Recd Encounter

1 13-JAN-2006 Alpha 0 1
1 25-JAN-2006 Beta 0 1
1 06-FEB-2006 Z 1 1
1 18-FEB-2006 Delta 0 2
1 02-MAR-2006 Y 1 2
2 14-MAR-2006 Zeta 0 1
2 26-MAR-2006 Eta 0 1
2 07-APR-2006 Theta 0 1
2 19-APR-2006 Iota 0 1
2 01-MAY-2006 X 1 1
3 13-MAY-2006 W 1 1
3 25-MAY-2006 Mu 0 2
3 06-JUN-2006 Nu 0 2
3 18-JUN-2006 Xi 0 2
3 30-JUN-2006 Omikron 0 2
3 12-JUL-2006 V 1 2
3 24-JUL-2006 Rho 0 3

Number of cases read: 17 Number of cases listed: 17

* ............ Example: ................................... .
* Summarize mental-health records between successive police .
* encounters. .
* (Many other summaries are possible, including all variables .
* totals or means of variables from the mental-health records.).

* Police records are dropped before AGGREGATE, so save them .
* in a scratch file. .
* (In SPSS 14/15, save in a dataset, but logic is more .
* complicated.) .
* "Police#" may well be a file handle. .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
/DROP=Cop_Recd /* Would always be 1 in this file */
/DROP=Datum /* Drop all vbls specific to mental health */.

* For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

* Summary of mental-health records between encounters: .
AGGREGATE OUTFILE=*
/BREAK = Person Encounter
/N_Eval '# of mental health evaluations since last encounter'
= N
/LtstMhDt 'Date of latest evaluation preceding encounter'
= LAST(Date)
/LtstDatm 'Datum on most recent mental-health encounter'
= LAST(Datum)
/FrstMhDt 'Date of first evaluation since previous encounter'
= FIRST(Date)
/FrstDatm 'Datum on first evaluation since previous encounter'
= FIRST(Datum).
. /*-- LIST.

* Re-attach police records, and list: .
MATCH FILES
/FILE=Police#
/FILE=*
/BY = Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:14 |
|-----------------------------|---------------------------|
Per Encou
son Date Cop_Data nter N_Eval LtstMhDt LtstDatm FrstMhDt
FrstDatm

1 06-FEB-2006 Z 1 2 25-JAN-2006
Beta 13-JAN-2006 Alpha
1 02-MAR-2006 Y 2 1 18-FEB-2006
Delta 18-FEB-2006 Delta
2 01-MAY-2006 X 1 4 19-APR-2006
Iota 14-MAR-2006 Zeta
3 13-MAY-2006
W 1 0 . .
3 12-JUL-2006 V 2 4 30-JUN-2006
Omikron 25-MAY-2006 Mu
3 . 3 1 24-JUL-2006
Rho 24-JUL-2006 Rho

Number of cases read: 6 Number of cases listed: 6
* ............................................................ .

..........................
Appendix II: Test data
* Test data; relatively complicated .
NEW FILE.
DATA LIST LIST /
Person(N3) Cop_Recd(F2) Date(ADATE12) Datum(A8).
BEGIN DATA
001 0 01/01/2006 Alpha
001 0 01/01/2006 Beta
001 1 01/01/2006 Z
001 0 01/01/2006 Delta
001 1 01/01/2006 Y
002 0 01/01/2006 Zeta
002 0 01/01/2006 Eta
002 0 01/01/2006 Theta
002 0 01/01/2006 Iota
002 1 01/01/2006 X
003 1 01/01/2006 W
003 0 01/01/2006 Mu
003 0 01/01/2006 Nu
003 0 01/01/2006 Xi
003 0 01/01/2006 Omikron
003 1 01/01/2006 V
003 0 01/01/2006 Rho
END DATA.
COMPUTE DATE=DATE+TIME.DAYS($CASENUM*12).
FORMATS DATE (DATE11).
. /*-- LIST.
TEMPORARY.
SELECT IF COP_RECD EQ 1.
SAVE OUTFILE=Police
/RENAME = (Datum = Cop_Data)
/DROP = COP_RECD.

TEMPORARY.
SELECT IF COP_RECD EQ 0.
SAVE OUTFILE=MntlHlth
/DROP = COP_RECD.

statisticsdoc

Re: complex merge question

Richard,

Well done!

Stephen Brand

For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Monday, November 06, 2006 5:26 PM
To: Bob Green; [hidden email]
Cc: [hidden email]
Subject: Re: complex merge question

(And apologies for not responding yesterday. I was traveling, or tired
after traveling, the whole day.)

At 02:26 AM 11/5/2006, Bob Green wrote:

>Richard, interleaving is not a term I am familiar with.

It means putting the records in the same file, ordered by person and
date (hence 'interleaving'). See "BY Subcommand" in article "ADD FILES"
in the Command Syntax Reference.

>Basically Stephen's points 1-4 are what I have in mind. From your
>e-mail it seems the steps suggested by Stephen can be performed using
>command such as LAG or LEAVE. Is this correct?

Yes; in fact, a larger range of operations can be performed that way.
That's not to sell AGGREGATE short, however. It's very powerful and
very flexible, and you'll almost certainly use it, as well. Stephen's
suggestions are a special case that don't need interleaving.

Code below is untested, except as noted. It all assumes and requires,
. Files Police and MntlHlth contain variables Person and Date, and have
no other variables in common
. Both files are sorted by Person and Date.
. Variables the code creates do not conflict with any variables in
either file.

>>1.) utilize the mental health record file to determine the date of
>>first admission, resulting in one mental health record per case;

As Stephen wrote, this gives "one [summary] mental health record per
[person]: the person's first record. That has, inherently, two steps:
a.) Derive the single summary record from the mental-health file
b.) Attach that record to all police records for the person, using a
MATCH FILES in which the summary mental-health file is a TABLE.

Two ways to do the summary:
GET FILE = MmtlHlth.
AGGREGATE OUTFILE=I
/BREAK=Person
/MH_Frst 'Date of first mental-health record'
=MIN(DATE).

The above produces a file with only two variables: Person and MH_Frst.
The following retains all variables from the first mental-health
record, and may be preferable:

ADD FILES
/FILE=MntlHlth /RENAME=(DATE=MH_Frst)
/FIRST=Visit1.
SELECT IF (Visit1 = 1).
VAR LABEL MH_Frst 'Date of first mental-health record'.

>>2.) then do a one to many match between the date of first admission
>>file and the police records,

Like this, with file produced by either of the above two methods:

MATCH FILES
/TABLE=*
/FILE=Police
/BY PERSON.

>>3.) then code the police records as occurring before or after the
>>first admission,
>>4.) then aggregate pre-admission and post-admission records
>>separately with
>>regard to a severity indicator.

You can do a lot more with interleave logic. I'll give one example
only, which uses interleave, then AGGREGATE, to summarize mental-health
records between successive police encounters. It's tested. This is the
syntax; run output, with test data, is at the end.

* This is the fundamental interleave operation. If mental- .
* health evaluation and police encounter are on the same day, .
* it puts mental-health evaluation first. (Recommended.) .

ADD FILES
/File=MntlHlth
/File=Police /IN=Cop_Recd
/BY Person Date.
. /**/ LIST.

* Here's LEAVE: attach to each record the sequence number of .
* the current, or next following, police encounter. .

NUMERIC Encounter (F3).
VAR LABEL Encounter
'Sequential #, next later police encounter'.
LEAVE Encounter.

DO IF MISSING(LAG(Person)).
* First encounter for first person: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Person) NE Person.
* First encounter for later persons: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Cop_Recd) EQ 1.
* After a police encounter record, start .
* numbering for a new encounter. .
. COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.

* ............ Example: ................................... .
* Summarize mental-health records between successive police .
* encounters. .
* (Many other summaries are possible, including all variables .
* totals or means of variables from the mental-health records.).

* Police records are dropped before AGGREGATE, so save them .
* in a scratch file. .
* (In SPSS 14/15, save in a dataset, but logic is more .
* complicated.) .
* "Police#" may well be a file handle. .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
/DROP=Cop_Recd /* Would always be 1 in this file */
/DROP=Datum /* Drop all vbls specific to mental health */.

* For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

* Summary of mental-health records between encounters: .
AGGREGATE OUTFILE=*
/BREAK = Person Encounter
/N_Eval '# of mental health evaluations since last encounter'
= N
/LtstMhDt 'Date of latest evaluation preceding encounter'
= LAST(Date)
/LtstDatm 'Datum on most recent mental-health encounter'
= LAST(Datum)
/FrstMhDt 'Date of first evaluation since previous encounter'
= FIRST(Date)
/FrstDatm 'Datum on first evaluation since previous encounter'
= FIRST(Datum).
. /*-- LIST.

* Re-attach police records, and list: .
MATCH FILES
/FILE=Police#
/FILE=*
/BY = Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.

..........................
Appendix I: Draft output
* ................................................... .

* ..... Test data: Police ..... .
GET FILE=Police.
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
\SPSS2006-11-05 Green - complex merge question - Police.SAV

Person Date Cop_Data

1 06-FEB-2006 Z
1 02-MAR-2006 Y
2 01-MAY-2006 X
3 13-MAY-2006 W
3 12-JUL-2006 V

Number of cases read: 5 Number of cases listed: 5

* ..... Test data: Mental Health ..... .

GET FIle=MntlHlth.
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents\Temporary
\SPSS2006-11-05 Green - complex merge question - MntlHlth.SAV

Person Date Datum

1 13-JAN-2006 Alpha
1 25-JAN-2006 Beta
1 18-FEB-2006 Delta
2 14-MAR-2006 Zeta
2 26-MAR-2006 Eta
2 07-APR-2006 Theta
2 19-APR-2006 Iota
3 25-MAY-2006 Mu
3 06-JUN-2006 Nu
3 18-JUN-2006 Xi
3 30-JUN-2006 Omikron
3 24-JUL-2006 Rho

Number of cases read: 12 Number of cases listed: 12

* This is the fundamental interleave operation. If mental- .
* health evaluation and police encounter are on the same day, .
* it puts mental-health evaluation first. (Recommended.) .

ADD FILES
/File=MntlHlth
/File=Police /IN=Cop_Recd
/BY Person Date.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|

Person Date Datum Cop_Data Cop_Recd

1 13-JAN-2006 Alpha 0
1 25-JAN-2006 Beta 0
1 06-FEB-2006 Z 1
1 18-FEB-2006 Delta 0
1 02-MAR-2006 Y 1
2 14-MAR-2006 Zeta 0
2 26-MAR-2006 Eta 0
2 07-APR-2006 Theta 0
2 19-APR-2006 Iota 0
2 01-MAY-2006 X 1
3 13-MAY-2006 W 1
3 25-MAY-2006 Mu 0
3 06-JUN-2006 Nu 0
3 18-JUN-2006 Xi 0
3 30-JUN-2006 Omikron 0
3 12-JUL-2006 V 1
3 24-JUL-2006 Rho 0

Number of cases read: 17 Number of cases listed: 17

* Here's LEAVE: attach to each record the sequence number of .
* the current, or next following, police encounter. .

NUMERIC Encounter (F3).
VAR LABEL Encounter
'Sequential #, next later police encounter'.
LEAVE Encounter.

DO IF MISSING(LAG(Person)).
* First encounter for first person: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Person) NE Person.
* First encounter for later persons: .
. COMPUTE Encounter = 1.
ELSE IF LAG(Cop_Recd) EQ 1.
* After a police encounter record, start .
* numbering for a new encounter. .
. COMPUTE Encounter = Encounter + 1.
END IF.
. /**/ LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:11 |
|-----------------------------|---------------------------|
Person Date Datum Cop_Data Cop_Recd Encounter

1 13-JAN-2006 Alpha 0 1
1 25-JAN-2006 Beta 0 1
1 06-FEB-2006 Z 1 1
1 18-FEB-2006 Delta 0 2
1 02-MAR-2006 Y 1 2
2 14-MAR-2006 Zeta 0 1
2 26-MAR-2006 Eta 0 1
2 07-APR-2006 Theta 0 1
2 19-APR-2006 Iota 0 1
2 01-MAY-2006 X 1 1
3 13-MAY-2006 W 1 1
3 25-MAY-2006 Mu 0 2
3 06-JUN-2006 Nu 0 2
3 18-JUN-2006 Xi 0 2
3 30-JUN-2006 Omikron 0 2
3 12-JUL-2006 V 1 2
3 24-JUL-2006 Rho 0 3

Number of cases read: 17 Number of cases listed: 17

* ............ Example: ................................... .
* Summarize mental-health records between successive police .
* encounters. .
* (Many other summaries are possible, including all variables .
* totals or means of variables from the mental-health records.).

* Police records are dropped before AGGREGATE, so save them .
* in a scratch file. .
* (In SPSS 14/15, save in a dataset, but logic is more .
* complicated.) .
* "Police#" may well be a file handle. .

TEMPORARY.
SELECT IF Cop_Recd.
SAVE OUTFILE = Police#
/DROP=Cop_Recd /* Would always be 1 in this file */
/DROP=Datum /* Drop all vbls specific to mental health */.

* For AGGREGATE of mental-health records, drop police records .
SELECT IF NOT Cop_Recd.
. /*-- LIST.

* Summary of mental-health records between encounters: .
AGGREGATE OUTFILE=*
/BREAK = Person Encounter
/N_Eval '# of mental health evaluations since last encounter'
= N
/LtstMhDt 'Date of latest evaluation preceding encounter'
= LAST(Date)
/LtstDatm 'Datum on most recent mental-health encounter'
= LAST(Datum)
/FrstMhDt 'Date of first evaluation since previous encounter'
= FIRST(Date)
/FrstDatm 'Datum on first evaluation since previous encounter'
= FIRST(Datum).
. /*-- LIST.

* Re-attach police records, and list: .
MATCH FILES
/FILE=Police#
/FILE=*
/BY = Person Encounter.
RECODE N_Eval (MISSING=0).
LIST.
|-----------------------------|---------------------------|
|Output Created |06-NOV-2006 17:12:14 |
|-----------------------------|---------------------------|
Per Encou
son Date Cop_Data nter N_Eval LtstMhDt LtstDatm FrstMhDt
FrstDatm

1 06-FEB-2006 Z 1 2 25-JAN-2006
Beta 13-JAN-2006 Alpha
1 02-MAR-2006 Y 2 1 18-FEB-2006
Delta 18-FEB-2006 Delta
2 01-MAY-2006 X 1 4 19-APR-2006
Iota 14-MAR-2006 Zeta
3 13-MAY-2006
W 1 0 . .
3 12-JUL-2006 V 2 4 30-JUN-2006
Omikron 25-MAY-2006 Mu
3 . 3 1 24-JUL-2006
Rho 24-JUL-2006 Rho

Number of cases read: 6 Number of cases listed: 6
* ............................................................ .

..........................
Appendix II: Test data
* Test data; relatively complicated .
NEW FILE.
DATA LIST LIST /
Person(N3) Cop_Recd(F2) Date(ADATE12) Datum(A8).
BEGIN DATA
001 0 01/01/2006 Alpha
001 0 01/01/2006 Beta
001 1 01/01/2006 Z
001 0 01/01/2006 Delta
001 1 01/01/2006 Y
002 0 01/01/2006 Zeta
002 0 01/01/2006 Eta
002 0 01/01/2006 Theta
002 0 01/01/2006 Iota
002 1 01/01/2006 X
003 1 01/01/2006 W
003 0 01/01/2006 Mu
003 0 01/01/2006 Nu
003 0 01/01/2006 Xi
003 0 01/01/2006 Omikron
003 1 01/01/2006 V
003 0 01/01/2006 Rho
END DATA.
COMPUTE DATE=DATE+TIME.DAYS($CASENUM*12).
FORMATS DATE (DATE11).
. /*-- LIST.
TEMPORARY.
SELECT IF COP_RECD EQ 1.
SAVE OUTFILE=Police
/RENAME = (Datum = Cop_Data)
/DROP = COP_RECD.

TEMPORARY.
SELECT IF COP_RECD EQ 0.
SAVE OUTFILE=MntlHlth
/DROP = COP_RECD.