SPSSX Discussion

Matching and Adding Cases

Classic

List

Threaded

6 messages Options

DKUKEC

Matching and Adding Cases

Dear SPSS List,

I have a question concerning matching and adding cases by a specific identifier using two different datasets. It is straight forward to calculate recidivism when we have 1 program start and end date per participants. However, I would like to create a DATASET that combines the two datasets and computes recidivism based on individuals with multiple prorgam cases (multiple program start and end dates).
I would also like to include the program cases that DO NOT have an arrest record as well. In the ARREST DATASET we have 2 participants, 4 arrests, and 5 charges (records), in the CASE DATASET we have 3 participants, 5 prorgam cases, 5 program start dates, and 5 program end dates.

Below is the two test DATASETS, along with a third data set which I created mannually to illustrate what I would like to do. Any suggestions would be greatly appreciated. Also, how does one read in a "missing" or blank data value for a variable?

Thank you in advance.

****************

DATA LIST FREE / RPUID (F4) ARREST_DT (DATE11) CHARGE (F3).
BEGIN DATA.
6294 18-Apr-2005 1.00
6294 18-Apr-2005 1.00
6294 27-May-2005 2.00
6294 31-Jan-2010 6.00
1234 31-Jan-2010 1.00
END DATA.
DATASET NAME ARRESTS.
SORT CASES BY RPUID ARREST_DT CHARGE.
LIST RPUID ARREST_DT CHARGE.
EXECUTE.

DATA LIST FREE / RPUID (F4) CASE_ID (F4) STARTDATE (DATE11) ENDDATE (DATE11) .
BEGIN DATA.
6294 1 18-Apr-2005 18-Apr-2006
1234 2 18-Apr-2011 11-May-2011
6294 3 01-May-2004 27-May-2004
6294 4 01-Dec-2011 30-Nov-2012
4321 5 01-Dec-2011 30-Nov-2012
END DATA.
DATASET NAME CASES.
SORT CASES BY RPUID CASE_ID STARTDATE ENDDATE .
LIST RPUID CASE_ID STARTDATE ENDDATE .
EXECUTE.

********

DATA LIST FREE / RPUID (F4) ARREST_DT (DATE11) CHARGE (F3) CASE_ID (F4) STARTDATE (DATE11) ENDDATE (DATE11).
BEGIN DATA
1234 31-JAN-2010 1 2 18-APR-2011 11-MAY-2011
6294 18-APR-2005 1 1 18-APR-2005 18-APR-2006
6294 18-APR-2005 1 1 18-APR-2005 18-APR-2006
6294 27-MAY-2005 2 1 18-APR-2005 18-APR-2006
6294 31-JAN-2010 6 1 18-APR-2005 18-APR-2006
6294 18-APR-2005 1 3 01-MAY-2004 27-MAY-2004
6294 18-APR-2005 1 3 01-MAY-2004 27-MAY-2004
6294 27-MAY-2005 2 3 01-MAY-2004 27-MAY-2004
6294 31-JAN-2010 6 3 01-MAY-2004 27-MAY-2004
6294 18-APR-2005 1 4 01-DEC-2011 30-NOV-2012
6294 18-APR-2005 1 4 01-DEC-2011 30-NOV-2012
6294 27-MAY-2005 2 4 01-DEC-2011 30-NOV-2012
6294 31-JAN-2010 6 4 01-DEC-2011 30-NOV-2012
4321 5 01-DEC-2011 30-NOV-2012
END DATA.
DATASET NAME TOTAL.
EXECUTE.

Andy W

Re: Matching and Adding Cases

Here is how I would go about it, you have to flatten one of the tables, merge, and then reshape to long again. I chose to flatten the CASES table here because in the end you wanted the people in the CASES table even if they did not have an arrest.

*********************************************************.
*1: Flatten the arrest table.
DATASET ACTIVATE ARRESTS.
CASESTOVARS
/ID RPUID
/SEPERATOR = "_".

*2: Table merge into cases.
DATASET ACTIVATE CASES.
MATCH FILES FILE = *
/TABLE = 'ARRESTS'
/BY RPUID.

*3: Reshape arrests - keep nulls.
*You will have to replace the end variable for each list.
VARSTOCASES
/MAKE ARREST_DT FROM ARREST_DT_1 TO ARREST_DT_4
/MAKE CHARGE FROM CHARGE_1 TO CHARGE_4
/NULL = KEEP.

*4: Get rid of duplicates.
SORT CASES BY RPUID STARTDATE ENDDATE ARREST_DT.
MATCH FILES FILE = *
/FIRST = Keep
/BY RPUID STARTDATE ENDDATE ARREST_DT.
SELECT IF Keep = 1.

*5: Get rid of empty arrests if there is another valid arrest.
COMPUTE MisArrest = MISSING(ARREST_DT).
SORT CASES BY RPUID MisArrest.
if RPUID = LAG(RPUID) AND MisArrest = 1 Keep = 0.
SELECT IF Keep = 1.
*********************************************************.

Thank you for the example data and expected outcome! For the easier question, to insert missing data for numeric variables on a data list command you can put any string that will not conform to a numeric variable (I frequently use a period). For blank strings you can just place quotations with nothing in them, e.g. [""].

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

DKUKEC

Re: Matching and Adding Cases

Thank you Andy,

I think that this is very helpful. I am wondering, how I would use this syntax to apply it to my full dataset which in some cases includes multiple cases and arrests, well over the number used in the example. I have a couple of examples where indiviudals have 70 arrests?

Thanks again,
Damir

Andy W

Re: Matching and Adding Cases

Yep - my experience suggests that there is always at least one guy with something like 40-50 arrests over a 5 year window in the various LE arrest databases I have worked with. Logic should be the same though, to merge the cases and arrest files you have to flatten one of them.

You will have to have real big tables to not be able to flatten and merge (although the bigger the tables the longer it will take). Reducing the arrest table to only the necessary people and only keeping necessary variables will make the computation faster as well.

To get any more useful advice you will have to be more specific as to what the problems with what I proposed are.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

DKUKEC

Re: Matching and Adding Cases

Hi Andy,

I think that the syntax you provided will work very well - no problem at all. I was just wondering if there was a way to "loop" the flattening process?

"/MAKE ARREST_DT FROM ARREST_DT_1 TO ARREST_DT_4"

I noticed ARREST_DT_4 represents the number of charges per individual (RPUID). I was just wondering if there was a way to determine the number automatically as part of the process. I will do as you suggested by using only the necessary varaibles to do the caluclations.

Your syntax works very well and is very much appreciated.

Thanks again,
Damir

Andy W

Re: Matching and Adding Cases

It takes a little more work, but below I show an example of using a filler variable trick so you don't need to know how many cases the ARREST file gets flattened to. This is dependent on there being a unique ID within the arrest table to later table merge the other ARREST variables back in (I'm pretty sure I've stolen this idea from David Marso somewhere).

**********************************************************.
DATA LIST FREE / RPUID (F4) ARRESTID (F1.0) ARREST_DT (DATE11) CHARGE (F3).
BEGIN DATA.
6294 1 18-Apr-2005 1.00
6294 2 18-Apr-2005 1.00
6294 3 27-May-2005 2.00
6294 4 31-Jan-2010 6.00
1234 5 31-Jan-2010 1.00
END DATA.
DATASET NAME ARRESTS.
SORT CASES BY RPUID ARRESTID ARREST_DT CHARGE.
LIST RPUID ARREST_DT CHARGE.
EXECUTE.

DATA LIST FREE / RPUID (F4) CASE_ID (F4) STARTDATE (DATE11) ENDDATE (DATE11) .
BEGIN DATA.
6294 1 18-Apr-2005 18-Apr-2006
1234 2 18-Apr-2011 11-May-2011
6294 3 01-May-2004 27-May-2004
6294 4 01-Dec-2011 30-Nov-2012
4321 5 01-Dec-2011 30-Nov-2012
END DATA.
DATASET NAME CASES.
SORT CASES BY RPUID CASE_ID STARTDATE ENDDATE .
LIST RPUID CASE_ID STARTDATE ENDDATE .
EXECUTE.

*********************************************************.
*1: Flatten the arrest table.
DATASET ACTIVATE ARRESTS.
DATASET COPY ARRESTORIG.
DATASET ACTIVATE ARRESTS.
CASESTOVARS
/ID RPUID
/SEPERATOR = "_"
/DROP = ARREST_DT to CHARGE.
*1a: compute filler.
compute @ = -1.

*2: Table merge into cases.
DATASET ACTIVATE CASES.
MATCH FILES FILE = *
/TABLE = 'ARRESTS'
/BY RPUID.

*3: Reshape arrests - keep nulls.
*You will have to replace the end variable for each list.
VARSTOCASES
/MAKE ARRESTID FROM ARRESTID_1 to @
/INDEX filler (ARRESTID)
/NULL = KEEP.
*3a: select out fillers.
select if filler <> "@".
*3b: add in ARREST info.
sort cases by RPUID ARRESTID.
match files file = *
/table = 'ARRESTORIG'
/by RPUID ARRESTID.

*4: Get rid of duplicates.
SORT CASES BY RPUID STARTDATE ENDDATE ARREST_DT.
MATCH FILES FILE = *
/FIRST = Keep
/BY RPUID STARTDATE ENDDATE ARREST_DT.
SELECT IF Keep = 1.

*5: Get rid of empty arrests if there is another valid arrest.
COMPUTE MisArrest = MISSING(ARREST_DT).
SORT CASES BY RPUID MisArrest.
if RPUID = LAG(RPUID) AND MisArrest = 1 Keep = 0.
SELECT IF Keep = 1.

*6: Clean up.
MATCH FILES FILE = *
/DROP filler Keep MisArrest.
dataset close ARRESTS.
dataset close ARRESTORIG.
*********************************************************.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/