I am working on a data set which looks like this:
caseID date_emplymnt ES ESmean ESDiff 1220 08/17/05 1 2 . 1220 03/09/06 3 2 2 1390 11/09/05 1 1.67 . 1390 02/08/06 1 1.67 0 1390 05/19/06 3 1.67 2 1445 08/15/05 3 2 . 1445 12/13/05 1 2 -2 1518 11/09/05 1 1.67 . 1518 02/08/06 3 1.67 2 1518 05/19/06 1 1.67 -2 where ES is "employment status" (1 = employed, 3=unemployed) and ESDiff indicates the "change in that employment status" from 1 to 3 or vice versa, which I am using only to spot check when a change occurs in the employment status for each caseID. Using a syntax script, I want to create a new variable with a particular value based on the conditions of the variables ES and ESDiff, with the value labels as follows: 1 = employed, 2=unemployed, 3=oscilating, and 4=no change (for each block of caseIDs). caseID date_emplymnt ES ESmean ESDiff ESChange 1220 08/17/05 1 2 . 1 1220 03/09/06 3 2 2 1 1390 11/09/05 1 1.67 . 1 1390 02/08/06 1 1.67 0 1 1390 05/19/06 3 1.67 2 1 1445 08/15/05 3 2 . 2 1445 12/13/05 1 2 -2 2 1518 11/09/05 1 1.67 . 3 1518 02/08/06 3 1.67 2 3 1518 05/19/06 1 1.67 -2 3 Thanks. |
J. Scelza,
Based on your output table (much thanks for providing that), it looks like you want to append a new variable called eschange that has the same value for all cases with the same case id. I'm going to split my response into two parts. The first part is that I don't understand how you calculate the value of eschange. You say eschange has values of >> ... 1 = employed, 2=unemployed, 3=oscilating, and 4=no change (for each block of caseIDs). Look at case 1290. Employed at time 1, unemployed at time 2. I'd call that 'oscillating' rather than 'employed'. Same with 1390 and 1545. I agree with you on 1518. So the first question is what are the rules for deciding whether eschange equals 1, 2, 3, or 4. At this point, I can't yet see the rule. caseID date_emplymnt ES ESmean ESDiff ESChange 1220 08/17/05 1 2 . 1 1220 03/09/06 3 2 2 1 1390 11/09/05 1 1.67 . 1 1390 02/08/06 1 1.67 0 1 1390 05/19/06 3 1.67 2 1 1445 08/15/05 3 2 . 2 1445 12/13/05 1 2 -2 2 1518 11/09/05 1 1.67 . 3 1518 02/08/06 3 1.67 2 3 1518 05/19/06 1 1.67 -2 3 The second part is that it would probably be much easier to convert your dataset from a 'long' structure to a 'wide' structure before computing eschange. As you may know, a wide structure means that each group of records with the same caseid is restructured to fit into one record. Restructured in that way the dataset would look like Casid empdate1 ES1 empdate2 ES2 empdate3 ES3 1220 08/17/05 1 03/09/06 3 . . 1390 11/09/05 1 02/08/06 1 05/19/06 3 1445 08/15/05 3 12/13/05 1 . . 1518 11/09/05 1 02/08/06 3 05/19/06 1 (I've deliberately omitted esmean and esdiff for simplicity but they could/would be included in an actual restructuring.) The reason for doing this is that as near as I can tell, calculating the value of eschange requires either beginning with the first record in a set of records and looking forwards or looking backwards over the set of records from the last record. If you do have to do that sort of operation, it is much easier to look across a set of values than across a set of cases. So you might restructure the file from long to wide, compute eschange, and restructure the file (back) from wide to long. Please read the documentation on the Casestovars and the Varstocases commands in the syntax reference. In terms of moving forward, I think we need an answer on the first question and your evaluation of the restructuring idea. Gene Maguin |
To weigh in on one point - at 02:52 PM 3/16/2007, Gene Maguin wrote:
>Based on your output table (much thanks for providing that), it looks >like you want to append a new variable called eschange that has the >same value for all cases with the same case id. I'm going to split my >response into two parts. [followed by well-considered questions and observations, which I'm not quoting] >The second part is that it would probably be much easier to convert >your dataset from a 'long' structure to a 'wide' structure before >computing eschange. As you may know, a wide structure means that each >group of records with the same caseid is restructured to fit into one >record. Without having analyzed the problem in detail, I'd lean the other way on this. SPSS has good facilities for computation across cases. Especially, AGGREGATE, with a little pre-computing of variables so they'll AGGREGATE as desired, can accomplish amazing things. That's particularly true in cases like this, where you don't know ahead of time how wide 'wide' is going to be - where there's no determinate number of records, or maximum number of records, for an ID(1). It's essentially impossible to write SPSS transformation code using a set of variables, when you don't know how big that set is, except by going into the data dictionary with Python and generating the code from that. But AGGREGATE works nicely across an indefinite number of cases per ID; as do other methods, using LAG, LEAVE, or scratch variables. -Good luck to both, Richard ...................... (1) Contrast, where there's a fixed small number of cases, like 2, per ID. See thread "data mgmt question for dyadic studies", Thu, 15 Mar 2007 <14:09:23 -0700>, ff. |
In reply to this post by J Scelza
To answer the question about the rules for the ESChange variable, we are
looking at them in a linear context. For example (and where empSTAT is 1=employed, 3=unemployed and ESChange is 1=employed to unemployed, 2=unemployed to employed, 3=oscillating, 4=no change), I wanted to create an ESChange variable that might look like this: caseID empSTAT empSTAT DIFF ESChange 1001 1 . 3 1001 3 -2 3 1001 1 2 3 1002 1 . 1 1002 3 2 1 1003 3 . 2 1003 3 0 2 1003 1 -2 2 1004 1 . 4 1004 1 0 4 1004 1 0 4 I included the empSTAT_DIFF variable (which calculated the difference in emp STAT per block of caseID) in case the syntax command to create the ESChange is conditional upon this. I worry about transposing the dataset because there are other variables included in the data set which I don't want to combine for each instance we have data on a caseID. Thanks, Janene |
Janene,
Much better explanation. Thank you. The value labels for eschange now make sense given the values for empstat. So now let me try to put the rules into words. Given a set of records with the same caseid. 1) If empstat changes from employed to unemployed and back again to employed, eschange is coded as 3=oscillating. 2) If empstat changes from unemployed to employed and back again to unemployed, eschange is coded as 3=oscillating. 3) If empstat changes from employed to unemployed but does not subsequently change back to employed, eschange is coded as 1=employed to unemployed. 4) If empstat changes from unemployed to employed but does not subsequently change back to unemployed, eschange is coded as 2=unemployed to employed. 5) If empstat is employed for all records in the set, eschange is coded as 4=no change. 6) If empstat is unemployed for all records in the set, eschange is coded as 4=no change. Do you agree with this statement of the rules? You say >>I worry about transposing the dataset because there are other variables included in the data set which I don't want to combine for each instance we have data on a caseID. Would you be willing to consider extracting just the variables needed for computing eschange to a new file, doing the rearrangements with that file and then matching the resulting file, which might have just two variables caseid and eschange, back to the original file? Last question. Do you have a sequence number variable that identifies the order of records within a set of records with the same case id? Gene Maguin |
In reply to this post by J Scelza
At 11:05 AM 3/19/2007, J Scelza wrote:
>To answer the question about the rules for the ESChange variable, we >are >looking at them in a linear context. "linear" == "sequential in time?" >For example (and where empSTAT is 1=employed, 3=unemployed and >ESChange is 1=employed to unemployed, 2=unemployed to employed, >3=oscillating, 4=no change) OK: ESChange is 4 - Client employed (empSTAT=1) OR unemployed (empSTAT=3) in all records 1 - Client employed (empSTAT=1) in one record and unemployed (empSTAT=3) in the following record, but never the reverse 2 - Client unemployed (empSTAT=3) in one record and employed ((empSTAT=1) in the following record, but never the reverse 3 - Change for 1 to 3, and from 3 to 1, both occur in the client's records. Try this, with AGGREGATE logic; SPSS 15 draft output. Test data is from your first posting. (Gene, the date identifies the sequence number within one caseID; the second posting omitted the dates.) * ............ Post after this point ............ . * ................................................... . LIST /* with ESMean, ESdiff dropped */ . List |-----------------------------|---------------------------| |Output Created |20-MAR-2007 10:50:40 | |-----------------------------|---------------------------| caseID DateEmpl ES 1220 08/17/2005 1 1220 03/09/2006 3 1390 11/09/2005 1 1390 02/08/2006 1 1390 05/19/2006 3 1445 08/15/2005 3 1445 12/13/2005 1 1518 11/09/2005 1 1518 02/08/2006 3 1518 05/19/2006 1 Number of cases read: 10 Number of cases listed: 10 * OK: ESChange is . * 4 - Client employed (empSTAT=1) OR unemployed (empSTAT=3) . * in all records . * 1 - Client employed (empSTAT=1) in one record and unemployed . * (empSTAT=3) in the following record, but never the reverse . * 2 - Client unemployed (empSTAT=3) in one record and employed . * (empSTAT=1) in the following record, but never the reverse . * 3 - Both of the above changes occur in the client's records. . * Re-compute ESDiff: . . NUMERIC ESDiff (F3). . COMPUTE ESDiff = $SYSMIS. . IF caseID EQ LAG(caseID) ESDiff = ES - LAG(ES). . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |20-MAR-2007 10:50:41 | |-----------------------------|---------------------------| caseID DateEmpl ES ESDiff 1220 08/17/2005 1 . 1220 03/09/2006 3 2 1390 11/09/2005 1 . 1390 02/08/2006 1 0 1390 05/19/2006 3 2 1445 08/15/2005 3 . 1445 12/13/2005 1 -2 1518 11/09/2005 1 . 1518 02/08/2006 3 2 1518 05/19/2006 1 -2 Number of cases read: 10 Number of cases listed: 10 AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK = caseID /ESmean = MEAN(ES) /MAXDiff = MAX(ESDiff) /MINDiff = MIN(ESDiff). FORMATS ESmean (F5.2). NUMERIC ESChange (F3). VAR LABEL ESChange 'Summary of employment changes; requested variable'. VAL LABEL ESChange 1 'Became unemployed' 2 'Became employed' 3 'Change both ways' 4 'No status change'. DO IF MISSING (MAXDiff) OR MISSING (MINDiff). . COMPUTE ESChange = 4. ELSE IF MAXDiff EQ 0 AND MINDiff EQ 0. . COMPUTE ESChange = 4. ELSE IF MAXDiff GT 0 /* Became unemployed */ AND MINDiff GE 0 /* Never became employed */. . COMPUTE ESChange = 1. ELSE IF MAXDiff LE 0 /* Never became unemployed */ AND MINDiff LT 0 /* Became employed */. . COMPUTE ESChange = 2. ELSE IF MAXDiff GT 0 /* Became unemployed */ AND MINDiff LT 0 /* Became employed */. . COMPUTE ESChange = 3. ELSE. . COMPUTE ESChange = 9. END IF. LIST. List |-----------------------------|---------------------------| |Output Created |20-MAR-2007 10:50:42 | |-----------------------------|---------------------------| caseID DateEmpl ES ESDiff ESmean MAXDiff MINDiff ESChange 1220 08/17/2005 1 . 2.00 2 2 1 1220 03/09/2006 3 2 2.00 2 2 1 1390 11/09/2005 1 . 1.67 2 0 1 1390 02/08/2006 1 0 1.67 2 0 1 1390 05/19/2006 3 2 1.67 2 0 1 1445 08/15/2005 3 . 2.00 -2 -2 2 1445 12/13/2005 1 -2 2.00 -2 -2 2 1518 11/09/2005 1 . 1.67 2 -2 3 1518 02/08/2006 3 2 1.67 2 -2 3 1518 05/19/2006 1 -2 1.67 2 -2 3 Number of cases read: 10 Number of cases listed: 10 ======================================= APPENDIX: Test data, from first posting ======================================= * ............ Test data ............ . * (from the original posting) . DATA LIST LIST SKIP=2 /caseID (N) DateEmpl (ADATE) ES ESmean ESDiff (3F). BEGIN DATA caseID date_emplymnt ES ESmean ESDiff 1220 08/17/05 1 2 . 1220 03/09/06 3 2 2 1390 11/09/05 1 1.67 . 1390 02/08/06 1 1.67 0 1390 05/19/06 3 1.67 2 1445 08/15/05 3 2 . 1445 12/13/05 1 2 -2 1518 11/09/05 1 1.67 . 1518 02/08/06 3 1.67 2 1518 05/19/06 1 1.67 -2 END DATA. FORMATS caseID (N4) /DateEmpl (ADATE10) /ES ESmean ESDiff (F3). SORT CASES BY caseID DateEmpl. DELETE VARIABLES ESMean ESDiff. |
Free forum by Nabble | Edit this page |