SPSSX Discussion

syntax for creating a pattern variable

Classic

List

Threaded

6 messages Options

J Scelza

syntax for creating a pattern variable

I am working on a data set which looks like this:

caseID date_emplymnt ES ESmean ESDiff

1220 08/17/05 1 2 .
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 .
1390 02/08/06 1 1.67 0
1390 05/19/06 3 1.67 2
1445 08/15/05 3 2 .
1445 12/13/05 1 2 -2
1518 11/09/05 1 1.67 .
1518 02/08/06 3 1.67 2
1518 05/19/06 1 1.67 -2

where ES is "employment status" (1 = employed, 3=unemployed) and ESDiff
indicates the "change in that employment status" from 1 to 3 or vice versa,
which I am using only to spot check when a change occurs in the employment
status for each caseID.

Using a syntax script, I want to create a new variable with a particular
value based on the conditions of the variables ES and ESDiff, with the value
labels as follows: 1 = employed, 2=unemployed, 3=oscilating, and 4=no change
(for each block of caseIDs).

caseID date_emplymnt ES ESmean ESDiff ESChange

1220 08/17/05 1 2 . 1
1220 03/09/06 3 2 2 1
1390 11/09/05 1 1.67 . 1
1390 02/08/06 1 1.67 0 1
1390 05/19/06 3 1.67 2 1
1445 08/15/05 3 2 . 2
1445 12/13/05 1 2 -2 2
1518 11/09/05 1 1.67 . 3
1518 02/08/06 3 1.67 2 3
1518 05/19/06 1 1.67 -2 3

Thanks.

Maguin, Eugene

Re: syntax for creating a pattern variable

J. Scelza,

Based on your output table (much thanks for providing that), it looks like
you want to append a new variable called eschange that has the same value
for all cases with the same case id. I'm going to split my response into two
parts. The first part is that I don't understand how you calculate the value
of eschange. You say eschange has values of

>> ... 1 = employed, 2=unemployed, 3=oscilating, and 4=no change
(for each block of caseIDs).

Look at case 1290. Employed at time 1, unemployed at time 2. I'd call that
'oscillating' rather than 'employed'. Same with 1390 and 1545. I agree with
you on 1518. So the first question is what are the rules for deciding
whether eschange equals 1, 2, 3, or 4. At this point, I can't yet see the
rule.

caseID date_emplymnt ES ESmean ESDiff ESChange
1220 08/17/05 1 2 . 1
1220 03/09/06 3 2 2 1
1390 11/09/05 1 1.67 . 1
1390 02/08/06 1 1.67 0 1
1390 05/19/06 3 1.67 2 1
1445 08/15/05 3 2 . 2
1445 12/13/05 1 2 -2 2
1518 11/09/05 1 1.67 . 3
1518 02/08/06 3 1.67 2 3
1518 05/19/06 1 1.67 -2 3

The second part is that it would probably be much easier to convert your
dataset from a 'long' structure to a 'wide' structure before computing
eschange. As you may know, a wide structure means that each group of records
with the same caseid is restructured to fit into one record. Restructured in
that way the dataset would look like

Casid empdate1 ES1 empdate2 ES2 empdate3 ES3
1220 08/17/05 1 03/09/06 3 . .
1390 11/09/05 1 02/08/06 1 05/19/06 3
1445 08/15/05 3 12/13/05 1 . .
1518 11/09/05 1 02/08/06 3 05/19/06 1

(I've deliberately omitted esmean and esdiff for simplicity but they
could/would be included in an actual restructuring.) The reason for doing
this is that as near as I can tell, calculating the value of eschange
requires either beginning with the first record in a set of records and
looking forwards or looking backwards over the set of records from the last
record. If you do have to do that sort of operation, it is much easier to
look across a set of values than across a set of cases. So you might
restructure the file from long to wide, compute eschange, and restructure
the file (back) from wide to long. Please read the documentation on the
Casestovars and the Varstocases commands in the syntax reference.

In terms of moving forward, I think we need an answer on the first question
and your evaluation of the restructuring idea.

Gene Maguin

Richard Ristow

Re: syntax for creating a pattern variable

To weigh in on one point - at 02:52 PM 3/16/2007, Gene Maguin wrote:

>Based on your output table (much thanks for providing that), it looks
>like you want to append a new variable called eschange that has the
>same value for all cases with the same case id. I'm going to split my
>response into two parts.

[followed by well-considered questions and observations, which I'm not
quoting]

>The second part is that it would probably be much easier to convert
>your dataset from a 'long' structure to a 'wide' structure before
>computing eschange. As you may know, a wide structure means that each
>group of records with the same caseid is restructured to fit into one
>record.

Without having analyzed the problem in detail, I'd lean the other way
on this.

SPSS has good facilities for computation across cases. Especially,
AGGREGATE, with a little pre-computing of variables so they'll
AGGREGATE as desired, can accomplish amazing things.

That's particularly true in cases like this, where you don't know ahead
of time how wide 'wide' is going to be - where there's no determinate
number of records, or maximum number of records, for an ID(1). It's
essentially impossible to write SPSS transformation code using a set of
variables, when you don't know how big that set is, except by going
into the data dictionary with Python and generating the code from that.
But AGGREGATE works nicely across an indefinite number of cases per ID;
as do other methods, using LAG, LEAVE, or scratch variables.

-Good luck to both,
Richard
......................
(1) Contrast, where there's a fixed small number of cases, like 2, per
ID. See thread "data mgmt question for dyadic studies", Thu, 15 Mar
2007 <14:09:23 -0700>, ff.

J Scelza

Re: syntax for creating a pattern variable

In reply to this post by J Scelza

To answer the question about the rules for the ESChange variable, we are
looking at them in a linear context.

For example (and where empSTAT is 1=employed, 3=unemployed and ESChange is
1=employed to unemployed, 2=unemployed to employed, 3=oscillating, 4=no
change), I wanted to create an ESChange variable that might look like this:

caseID empSTAT empSTAT DIFF ESChange
1001 1 . 3
1001 3 -2 3
1001 1 2 3
1002 1 . 1
1002 3 2 1
1003 3 . 2
1003 3 0 2
1003 1 -2 2
1004 1 . 4
1004 1 0 4
1004 1 0 4

I included the empSTAT_DIFF variable (which calculated the difference in emp
STAT per block of caseID) in case the syntax command to create the ESChange
is conditional upon this.

I worry about transposing the dataset because there are other variables
included in the data set which I don't want to combine for each instance we
have data on a caseID.

Thanks,

Janene

Maguin, Eugene

Re: syntax for creating a pattern variable

Janene,

Much better explanation. Thank you. The value labels for eschange now make
sense given the values for empstat. So now let me try to put the rules into
words.

Given a set of records with the same caseid.
1) If empstat changes from employed to unemployed and back again to
employed, eschange is coded as 3=oscillating.
2) If empstat changes from unemployed to employed and back again to
unemployed, eschange is coded as 3=oscillating.
3) If empstat changes from employed to unemployed but does not subsequently
change back to employed, eschange is coded as 1=employed to unemployed.
4) If empstat changes from unemployed to employed but does not subsequently
change back to unemployed, eschange is coded as 2=unemployed to employed.
5) If empstat is employed for all records in the set, eschange is coded as
4=no change.
6) If empstat is unemployed for all records in the set, eschange is coded as
4=no change.

Do you agree with this statement of the rules?

You say
>>I worry about transposing the dataset because there are other variables
included in the data set which I don't want to combine for each instance we
have data on a caseID.

Would you be willing to consider extracting just the variables needed for
computing eschange to a new file, doing the rearrangements with that file
and then matching the resulting file, which might have just two variables
caseid and eschange, back to the original file?

Last question. Do you have a sequence number variable that identifies the
order of records within a set of records with the same case id?

Gene Maguin

Richard Ristow

Re: syntax for creating a pattern variable

In reply to this post by J Scelza

At 11:05 AM 3/19/2007, J Scelza wrote:

>To answer the question about the rules for the ESChange variable, we
>are
>looking at them in a linear context.

"linear" == "sequential in time?"

>For example (and where empSTAT is 1=employed, 3=unemployed and
>ESChange is 1=employed to unemployed, 2=unemployed to employed,
>3=oscillating, 4=no change)

OK: ESChange is
4 - Client employed (empSTAT=1) OR unemployed (empSTAT=3) in all
records
1 - Client employed (empSTAT=1) in one record and unemployed
(empSTAT=3) in the following record, but never the reverse
2 - Client unemployed (empSTAT=3) in one record and employed
((empSTAT=1) in the following record, but never the reverse
3 - Change for 1 to 3, and from 3 to 1, both occur in the client's
records.

Try this, with AGGREGATE logic; SPSS 15 draft output. Test data is from
your first posting. (Gene, the date identifies the sequence number
within one caseID; the second posting omitted the dates.)

* ............ Post after this point ............ .
* ................................................... .
LIST /* with ESMean, ESdiff dropped */ .

List
|-----------------------------|---------------------------|
|Output Created |20-MAR-2007 10:50:40 |
|-----------------------------|---------------------------|
caseID DateEmpl ES

1220 08/17/2005 1
1220 03/09/2006 3
1390 11/09/2005 1
1390 02/08/2006 1
1390 05/19/2006 3
1445 08/15/2005 3
1445 12/13/2005 1
1518 11/09/2005 1
1518 02/08/2006 3
1518 05/19/2006 1

Number of cases read: 10 Number of cases listed: 10

* OK: ESChange
is .
* 4 - Client employed (empSTAT=1) OR unemployed
(empSTAT=3) .
* in all
records .
* 1 - Client employed (empSTAT=1) in one record and
unemployed .
* (empSTAT=3) in the following record, but never the
reverse .
* 2 - Client unemployed (empSTAT=3) in one record and
employed .
* (empSTAT=1) in the following record, but never the
reverse .
* 3 - Both of the above changes occur in the client's
records. .

* Re-compute
ESDiff: .
. NUMERIC ESDiff (F3).
. COMPUTE ESDiff = $SYSMIS.
. IF caseID EQ LAG(caseID) ESDiff = ES - LAG(ES).

. /**/ LIST /*-*/.

List
|-----------------------------|---------------------------|
|Output Created |20-MAR-2007 10:50:41 |
|-----------------------------|---------------------------|
caseID DateEmpl ES ESDiff

1220 08/17/2005 1 .
1220 03/09/2006 3 2
1390 11/09/2005 1 .
1390 02/08/2006 1 0
1390 05/19/2006 3 2
1445 08/15/2005 3 .
1445 12/13/2005 1 -2
1518 11/09/2005 1 .
1518 02/08/2006 3 2
1518 05/19/2006 1 -2

Number of cases read: 10 Number of cases listed: 10

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK = caseID
/ESmean = MEAN(ES)
/MAXDiff = MAX(ESDiff)
/MINDiff = MIN(ESDiff).

FORMATS ESmean (F5.2).

NUMERIC ESChange (F3).
VAR LABEL ESChange 'Summary of employment changes; requested variable'.
VAL LABEL ESChange
1 'Became unemployed'
2 'Became employed'
3 'Change both ways'
4 'No status change'.

DO IF MISSING (MAXDiff)
OR MISSING (MINDiff).
. COMPUTE ESChange = 4.
ELSE IF MAXDiff EQ 0
AND MINDiff EQ 0.
. COMPUTE ESChange = 4.
ELSE IF MAXDiff GT 0 /* Became unemployed */
AND MINDiff GE 0 /* Never became employed */.
. COMPUTE ESChange = 1.
ELSE IF MAXDiff LE 0 /* Never became unemployed */
AND MINDiff LT 0 /* Became employed */.
. COMPUTE ESChange = 2.
ELSE IF MAXDiff GT 0 /* Became unemployed */
AND MINDiff LT 0 /* Became employed */.
. COMPUTE ESChange = 3.
ELSE.
. COMPUTE ESChange = 9.
END IF.

LIST.

List
|-----------------------------|---------------------------|
|Output Created |20-MAR-2007 10:50:42 |
|-----------------------------|---------------------------|
caseID DateEmpl ES ESDiff ESmean MAXDiff MINDiff ESChange

1220 08/17/2005 1 . 2.00 2 2 1
1220 03/09/2006 3 2 2.00 2 2 1
1390 11/09/2005 1 . 1.67 2 0 1
1390 02/08/2006 1 0 1.67 2 0 1
1390 05/19/2006 3 2 1.67 2 0 1
1445 08/15/2005 3 . 2.00 -2 -2 2
1445 12/13/2005 1 -2 2.00 -2 -2 2
1518 11/09/2005 1 . 1.67 2 -2 3
1518 02/08/2006 3 2 1.67 2 -2 3
1518 05/19/2006 1 -2 1.67 2 -2 3

Number of cases read: 10 Number of cases listed: 10

=======================================
APPENDIX: Test data, from first posting
=======================================
* ............ Test data ............ .
* (from the original posting) .

DATA LIST LIST SKIP=2
/caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F).

BEGIN DATA
caseID date_emplymnt ES ESmean ESDiff

1220 08/17/05 1 2 .
1220 03/09/06 3 2 2
1390 11/09/05 1 1.67 .
1390 02/08/06 1 1.67 0
1390 05/19/06 3 1.67 2
1445 08/15/05 3 2 .
1445 12/13/05 1 2 -2
1518 11/09/05 1 1.67 .
1518 02/08/06 3 1.67 2
1518 05/19/06 1 1.67 -2
END DATA.
FORMATS caseID (N4)
/DateEmpl (ADATE10)
/ES ESmean ESDiff (F3).

SORT CASES BY caseID DateEmpl.
DELETE VARIABLES ESMean ESDiff.