syntax for creating a pattern variable

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

syntax for creating a pattern variable

J Scelza
I am working on a data set which looks like this:

caseID    date_emplymnt    ES  ESmean   ESDiff

1220       08/17/05        1     2        .
1220       03/09/06        3     2        2
1390       11/09/05        1     1.67     .
1390       02/08/06        1     1.67     0
1390       05/19/06        3     1.67     2
1445       08/15/05        3     2        .
1445       12/13/05        1     2       -2
1518       11/09/05        1     1.67     .
1518       02/08/06        3     1.67     2
1518       05/19/06        1     1.67    -2

where ES is "employment status" (1 = employed, 3=unemployed) and ESDiff
indicates the "change in that employment status" from 1 to 3 or vice versa,
which I am using only to spot check when a change occurs in the employment
status for each caseID.

Using a syntax script, I want to create a new variable with a particular
value based on the conditions of the variables ES and ESDiff, with the value
labels as follows: 1 = employed, 2=unemployed, 3=oscilating, and 4=no change
(for each block of caseIDs).


caseID    date_emplymnt    ES  ESmean   ESDiff   ESChange

1220       08/17/05        1     2        .         1
1220       03/09/06        3     2        2         1
1390       11/09/05        1     1.67     .         1
1390       02/08/06        1     1.67     0         1
1390       05/19/06        3     1.67     2         1
1445       08/15/05        3     2        .         2
1445       12/13/05        1     2       -2         2
1518       11/09/05        1     1.67     .         3
1518       02/08/06        3     1.67     2         3
1518       05/19/06        1     1.67    -2         3




Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: syntax for creating a pattern variable

Maguin, Eugene
J. Scelza,

Based on your output table (much thanks for providing that), it looks like
you want to append a new variable called eschange that has the same value
for all cases with the same case id. I'm going to split my response into two
parts. The first part is that I don't understand how you calculate the value
of eschange. You say eschange has values of

>> ... 1 = employed, 2=unemployed, 3=oscilating, and 4=no change
(for each block of caseIDs).

Look at case 1290. Employed at time 1, unemployed at time 2. I'd call that
'oscillating' rather than 'employed'. Same with 1390 and 1545. I agree with
you on 1518. So the first question is what are the rules for deciding
whether eschange equals 1, 2, 3, or 4. At this point, I can't yet see the
rule.


caseID    date_emplymnt    ES  ESmean   ESDiff   ESChange
1220       08/17/05        1     2        .         1
1220       03/09/06        3     2        2         1
1390       11/09/05        1     1.67     .         1
1390       02/08/06        1     1.67     0         1
1390       05/19/06        3     1.67     2         1
1445       08/15/05        3     2        .         2
1445       12/13/05        1     2       -2         2
1518       11/09/05        1     1.67     .         3
1518       02/08/06        3     1.67     2         3
1518       05/19/06        1     1.67    -2         3

The second part is that it would probably be much easier to convert your
dataset from a 'long' structure to a 'wide' structure before computing
eschange. As you may know, a wide structure means that each group of records
with the same caseid is restructured to fit into one record. Restructured in
that way the dataset would look like

Casid  empdate1   ES1  empdate2   ES2  empdate3   ES3
1220   08/17/05    1    03/09/06   3      .        .
1390   11/09/05    1    02/08/06   1   05/19/06    3
1445   08/15/05    3    12/13/05   1      .        .
1518   11/09/05    1    02/08/06   3   05/19/06    1

(I've deliberately omitted esmean and esdiff for simplicity but they
could/would be included in an actual restructuring.) The reason for doing
this is that as near as I can tell, calculating the value of eschange
requires either beginning with the first record in a set of records and
looking forwards or looking backwards over the set of records from the last
record. If you do have to do that sort of operation, it is much easier to
look across a set of values than across a set of cases. So you might
restructure the file from long to wide, compute eschange, and restructure
the file (back) from wide to long. Please read the documentation on the
Casestovars and the Varstocases commands in the syntax reference.

In terms of moving forward, I think we need an answer on the first question
and your evaluation of the restructuring idea.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: syntax for creating a pattern variable

Richard Ristow
To weigh in on one point - at 02:52 PM 3/16/2007, Gene Maguin wrote:

>Based on your output table (much thanks for providing that), it looks
>like you want to append a new variable called eschange that has the
>same value for all cases with the same case id. I'm going to split my
>response into two parts.

[followed by well-considered questions and observations, which I'm not
quoting]

>The second part is that it would probably be much easier to convert
>your dataset from a 'long' structure to a 'wide' structure before
>computing eschange. As you may know, a wide structure means that each
>group of records with the same caseid is restructured to fit into one
>record.

Without having analyzed the problem in detail, I'd lean the other way
on this.

SPSS has good facilities for computation across cases. Especially,
AGGREGATE, with a little pre-computing of variables so they'll
AGGREGATE as desired, can accomplish amazing things.

That's particularly true in cases like this, where you don't know ahead
of time how wide 'wide' is going to be - where there's no determinate
number of records, or maximum number of records, for an ID(1). It's
essentially impossible to write SPSS transformation code using a set of
variables, when you don't know how big that set is, except by going
into the data dictionary with Python and generating the code from that.
But AGGREGATE works nicely across an indefinite number of cases per ID;
as do other methods, using LAG, LEAVE, or scratch variables.

-Good luck to both,
  Richard
......................
(1) Contrast, where there's a fixed small number of cases, like 2, per
ID. See thread "data mgmt question for dyadic studies", Thu, 15 Mar
2007 <14:09:23 -0700>, ff.
Reply | Threaded
Open this post in threaded view
|

Re: syntax for creating a pattern variable

J Scelza
In reply to this post by J Scelza
To answer the question about the rules for the ESChange variable, we are
looking at them in a linear context.

For example (and where empSTAT is 1=employed, 3=unemployed and ESChange is
1=employed to unemployed, 2=unemployed to employed, 3=oscillating, 4=no
change), I wanted to create an ESChange variable that might look like this:

caseID   empSTAT   empSTAT DIFF  ESChange
1001        1          .            3
1001        3         -2            3
1001        1          2            3
1002        1          .            1
1002        3          2            1
1003        3          .            2
1003        3          0            2
1003        1         -2            2
1004        1          .            4
1004        1          0            4
1004        1          0            4


I included the empSTAT_DIFF variable (which calculated the difference in emp
STAT per block of caseID) in case the syntax command to create the ESChange
is conditional upon this.

I worry about transposing the dataset because there are other variables
included in the data set which I don't want to combine for each instance we
have data on a caseID.

Thanks,

Janene
Reply | Threaded
Open this post in threaded view
|

Re: syntax for creating a pattern variable

Maguin, Eugene
Janene,

Much better explanation. Thank you. The value labels for eschange now make
sense given the values for empstat. So now let me try to put the rules into
words.

Given a set of records with the same caseid.
1) If empstat changes from employed to unemployed and back again to
employed, eschange is coded as 3=oscillating.
2) If empstat changes from unemployed to employed and back again to
unemployed, eschange is coded as 3=oscillating.
3) If empstat changes from employed to unemployed but does not subsequently
change back to employed, eschange is coded as 1=employed to unemployed.
4) If empstat changes from unemployed to employed but does not subsequently
change back to unemployed, eschange is coded as 2=unemployed to employed.
5) If empstat is employed for all records in the set, eschange is coded as
4=no change.
6) If empstat is unemployed for all records in the set, eschange is coded as
4=no change.

Do you agree with this statement of the rules?

You say
>>I worry about transposing the dataset because there are other variables
included in the data set which I don't want to combine for each instance we
have data on a caseID.

Would you be willing to consider extracting just the variables needed for
computing eschange to a new file, doing the rearrangements with that file
and then matching the resulting file, which might have just two variables
caseid and eschange, back to the original file?

Last question. Do you have a sequence number variable that identifies the
order of records within a set of records with the same case id?

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: syntax for creating a pattern variable

Richard Ristow
In reply to this post by J Scelza
At 11:05 AM 3/19/2007, J Scelza wrote:

>To answer the question about the rules for the ESChange variable, we
>are
>looking at them in a linear context.

"linear" == "sequential in time?"

>For example (and where empSTAT is 1=employed, 3=unemployed and
>ESChange is 1=employed to unemployed, 2=unemployed to employed,
>3=oscillating, 4=no change)

OK: ESChange is
4 - Client employed (empSTAT=1) OR unemployed (empSTAT=3) in all
records
1 - Client employed (empSTAT=1) in one record and unemployed
(empSTAT=3) in the following record, but never the reverse
2 - Client unemployed (empSTAT=3) in one record and employed
((empSTAT=1) in the following record, but never the reverse
3 - Change for 1 to 3, and from 3 to 1, both occur in the client's
records.

Try this, with AGGREGATE logic; SPSS 15 draft output. Test data is from
your first posting. (Gene, the date identifies the sequence number
within one caseID; the second posting omitted the dates.)

* ............   Post after this point   ............          .
* ...................................................          .
LIST /* with ESMean, ESdiff dropped */ .

List
|-----------------------------|---------------------------|
|Output Created               |20-MAR-2007 10:50:40       |
|-----------------------------|---------------------------|
caseID   DateEmpl  ES

  1220  08/17/2005   1
  1220  03/09/2006   3
  1390  11/09/2005   1
  1390  02/08/2006   1
  1390  05/19/2006   3
  1445  08/15/2005   3
  1445  12/13/2005   1
  1518  11/09/2005   1
  1518  02/08/2006   3
  1518  05/19/2006   1

Number of cases read:  10    Number of cases listed:  10


*  OK: ESChange
is                                                     .
*  4 - Client employed (empSTAT=1) OR unemployed
(empSTAT=3)           .
*      in all
records                                                  .
*  1 - Client employed (empSTAT=1) in one record and
unemployed        .
*      (empSTAT=3) in the following record, but never the
reverse      .
*  2 - Client unemployed (empSTAT=3) in one record and
employed        .
*      (empSTAT=1) in the following record, but never the
reverse      .
*  3 - Both of the above changes occur in the client's
records.        .

*  Re-compute
ESDiff:                                                  .
.  NUMERIC    ESDiff (F3).
.  COMPUTE    ESDiff = $SYSMIS.
.  IF  caseID EQ LAG(caseID) ESDiff = ES - LAG(ES).

.  /**/  LIST  /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |20-MAR-2007 10:50:41       |
|-----------------------------|---------------------------|
caseID   DateEmpl  ES ESDiff

  1220  08/17/2005   1     .
  1220  03/09/2006   3     2
  1390  11/09/2005   1     .
  1390  02/08/2006   1     0
  1390  05/19/2006   3     2
  1445  08/15/2005   3     .
  1445  12/13/2005   1    -2
  1518  11/09/2005   1     .
  1518  02/08/2006   3     2
  1518  05/19/2006   1    -2

Number of cases read:  10    Number of cases listed:  10


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK   = caseID
    /ESmean  = MEAN(ES)
    /MAXDiff = MAX(ESDiff)
    /MINDiff = MIN(ESDiff).

FORMATS   ESmean (F5.2).

NUMERIC   ESChange (F3).
VAR LABEL ESChange 'Summary of employment changes; requested variable'.
VAL LABEL ESChange
    1 'Became unemployed'
    2 'Became employed'
    3 'Change both ways'
    4 'No status change'.

DO IF   MISSING (MAXDiff)
     OR  MISSING (MINDiff).
.  COMPUTE ESChange = 4.
ELSE IF  MAXDiff EQ 0
      AND MINDiff EQ 0.
.  COMPUTE ESChange = 4.
ELSE IF MAXDiff  GT 0  /* Became unemployed       */
     AND MINDiff  GE 0  /* Never became employed   */.
.  COMPUTE ESChange = 1.
ELSE IF MAXDiff  LE 0  /* Never became unemployed */
     AND MINDiff  LT 0  /* Became employed         */.
.  COMPUTE ESChange = 2.
ELSE IF MAXDiff  GT 0  /* Became unemployed       */
     AND MINDiff  LT 0  /* Became employed         */.
.  COMPUTE ESChange = 3.
ELSE.
.  COMPUTE ESChange = 9.
END IF.

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |20-MAR-2007 10:50:42       |
|-----------------------------|---------------------------|
caseID   DateEmpl  ES ESDiff ESmean MAXDiff MINDiff ESChange

  1220  08/17/2005   1     .    2.00     2       2        1
  1220  03/09/2006   3     2    2.00     2       2        1
  1390  11/09/2005   1     .    1.67     2       0        1
  1390  02/08/2006   1     0    1.67     2       0        1
  1390  05/19/2006   3     2    1.67     2       0        1
  1445  08/15/2005   3     .    2.00    -2      -2        2
  1445  12/13/2005   1    -2    2.00    -2      -2        2
  1518  11/09/2005   1     .    1.67     2      -2        3
  1518  02/08/2006   3     2    1.67     2      -2        3
  1518  05/19/2006   1    -2    1.67     2      -2        3

Number of cases read:  10    Number of cases listed:  10

=======================================
APPENDIX: Test data, from first posting
=======================================
* ............   Test data               ............          .
* (from the original posting)                                  .


DATA LIST LIST SKIP=2
    /caseID (N) DateEmpl (ADATE) ES  ESmean  ESDiff (3F).

BEGIN DATA
     caseID     date_emplymnt    ES  ESmean   ESDiff

     1220        08/17/05        1     2        .
     1220        03/09/06        3     2        2
     1390        11/09/05        1     1.67     .
     1390        02/08/06        1     1.67     0
     1390        05/19/06        3     1.67     2
     1445        08/15/05        3     2        .
     1445        12/13/05        1     2       -2
     1518        11/09/05        1     1.67     .
     1518        02/08/06        3     1.67     2
     1518        05/19/06        1     1.67    -2
END DATA.
FORMATS caseID             (N4)
        /DateEmpl           (ADATE10)
        /ES  ESmean  ESDiff (F3).

SORT CASES BY caseID DateEmpl.
DELETE VARIABLES ESMean ESDiff.