matching cases-controls

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

matching cases-controls

Maguin, Eugene
This is an off-list message from H. Ozgen [[hidden email]]. If you
reply off-list, please do so to him/her and not me. Better yet, reply to the
list.

I have a dataset of 400 patietns (age range: 3-18) and 1000( age range 8-14)
controls.
I am trying to match them with sex and +/- 1 or 2 age.
how can I do that? is there any syntax for this? any suggestions welcome.
THANKS in advance for considering this problem.
H Ozgen

Heval Ozgen, MD
Department of Child and Adolescent Psychiatry
University Medical Center, Heidelberglaan 100
3508 GA Utrecht HP B 01.324
tel.: ++ 31 30 250 6362
fax.:++ 31 30 250 5444

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

Richard Ristow
At 10:37 AM 1/21/2008, Ozgen, H.M. wrote, off-list:
[...]
>THANKS in advance for considering this problem.

I join with Gene Maguin in feeling that questions, and responses,
should usually go to the whole list.
+ Posting invites all interested list members to apply their
knowledge and energy to the problem.
+ Posted questions and responses can inform everyone on the list;
off-list questions and responses inform only one person.

In that spirit, continuing on-list,

>I have a dataset of 400 patietns (age range: 3-18) and 1000( age
>range 8-14) controls.
>I am trying to match them with sex and +/- 1 or 2 age.

First, a simple question: if your patients' ages are from 3-18 and
your controls are all 8 years of age and over, how will you match
those patients less than 8 years old?

Finding best matches between cases and candidate controls by
comparing all possible pairs is a many-to-many merge, somewhat awkward.

But it may be much easier, in your case. With those sizes, you
probably have many cases and controls of both sexes, for every year
of age. It could reduce to randomly selecting as many controls as you
need for each sex-year combination, and randomly assigning those to
the cases with the same sex and year of age. The two random
operations can be done as one, using a random sort.

Does this get you any closer?

-With best wishes,
  Richard Ristow

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

H-19
In reply to this post by Maguin, Eugene
Thank you.
in the patient group sex ratio is 4/1 ( M/f) whereas in the control group
that is 1/1.
For age matching i can go +/- 1 if that will give me too less patients
then may try +/-2.

and for your question: how will i match the patients aged less than 8: i
can still match those are 6 or 7 yrs old, but the rest i loose..
this is also true for age 16 and above.

H

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

Maguin, Eugene
In reply to this post by Richard Ristow
All,

This is a reply to H. M. Ozgen who several days ago posted this question:

>>I have a dataset of 400 patients (age range: 3-18) and 1000( age range
8-14) controls.
I am trying to match them with sex and +/- 1 or 2 age.

Below is the syntax file I sent that creates two example data files meeting
his specifications and the syntax necessary to perform the matching
operation. Note that because of Ozgen's descriptions of the two samples and
my assumptions in building the example datasets, the solution was generally
simple. The simplicity came from the fact that there was an excess of
controls at every sex-age combination, which allowed me to have an exact age
match, except for the 6 and 7 year old patient groups. The 6 and 7 year old
patient groups imposed complexity because they could be matched with 8 year
old controls of the same sex. The programming complexity would have been
much greater had there been insufficient numbers of controls for an exact
sex-age match.

Lastly, I welcome comments or questions on my code so that I can improve my
understanding of this type of problem.

**********************************************************.
**********************************************************.
*  I'll assume you have patients in one file, called patients, and
*  controls in another file, called controls.
*  The key variables are id, age, sex. They are named as follows:
*  Patient file: id=cid, age=page, sex=psex (1=male, 2=female).
*  Control file: id=cid, age=cage, sex=csex (1=male, 2=female).


*  preliminary. create example data files.
input program.  /* patient file.
loop #i=1 to 400.
+  compute pid=#i.
+  compute psex=trunc(uniform(2))+1.
+  compute page=trunc(uniform(16))+3.
end case.
end loop.
end file.
end input program.
format pid(f3.0) psex(F1.0) page(f2.0).

frequencies pid psex page.

save outfile='c:\temp\patient.sav'.


input program.  /* control file.
loop #i=1 to 1000.
+  compute cid=#i.
+  compute csex=trunc(uniform(2))+1.
+  compute cage=trunc(uniform(7))+8.
end case.
end loop.
end file.
end input program.
format cid(f3.0) csex(F1.0) cage(f2.0).

frequencies cid csex cage.

save outfile='c:\temp\control.sav'.


**************************************************.
**************************************************.
*  Below here begins the matching operations.
**************************************************.



*A. Patient file activities.

Get file='c:\temp\patient.sav'.

*  Make a string composite sex plus age variable.
String link(a3).
Compute link=concat(string(psex,f1.0),string(trunc(page),f2.0)).

Frequencies link. /* to see how many combinations you have.

*  Aggregate and save file as count of cases needing a match.
Aggregate outfile='c:\temp\patientagg1.sav'/break=link/precs=nu.
execute.


*B. Control file activities.

Get file='c:\temp\control.sav'.

*  Make a string composite sex plus age variable.
String link(a3).
Compute link=concat(string(csex,f1.0),string(trunc(cage),f2.0)).

Frequencies link. /* to see how many combinations you have.

*  Aggregate and save file as count of cases needing a match.
Aggregate outfile='c:\temp\controlagg1.sav'/break=link/crecs=nu.
execute.


*C. Assessing the matching prospects.
*  This is the critical operation as you see for the first
*  time what the problems will be. What the crosstabs show
*  is the number of patient cases and the number of control
*  cases for each value of the matching variables.
match files file='c:\temp\patientagg1.sav'/
   file='c:\temp\controlagg1.sav'/by link.

crosstabs precs by crecs by link.


*  Conclusion: in every case the number control records available for
*  matching is greater than the number of patient records needing a
*  match. This is best possibole outcome. However, will need to consider
*  distribution end effects for lower end of patient distribution.


*  D. Back to control file.

Get file='c:\temp\control.sav'.

*  Make a string composite sex plus age variable.
String link(a3).
Compute link=concat(string(csex,f1.0),string(trunc(cage),f2.0)).

Frequencies link. /* to see how many combinations you have.


Sort cases by link.

*  Match the aggregated patient file to the control file.
match files file=*/table='c:\temp\patientagg1.sav'/by link.
execute.


*  Draw a number from a uniform distribution, sort by link and random
*  value, then number cases within link value.
compute ranval=uniform(1).

sort cases by link ranval.

compute linkrec=1.
if (link eq lag(link)) linkrec=lag(linkrec)+1.
execute.


*  keep the record if the rec number is less than equal to number of patient
*  records for that link value.
compute pick=0.
if (linkrec le precs) pick=1.
execute.


*  save control file for later use for distribution end effects.
*  then select and save records needed for matching.
Save outfile='c:\temp\control1.sav'.

select if (pick eq 1).
execute.

Save outfile='c:\temp\controlmatch1.sav'/drop ranval pick precs.


*E. Back to Patient file.

Get file='c:\temp\patient.sav'.

*  Make a string composite sex plus age variable.
String link(a3).
Compute link=concat(string(psex,f1.0),string(trunc(page),f2.0)).

*  sort cases by link and then number (use same variable names).
sort cases by link.

compute linkrec=1.
if (link eq lag(link)) linkrec=lag(linkrec)+1.
execute.

*  finally, match this file with the selected control cases.
match files file=*/file='c:\temp\controlmatch1.sav'/by link linkrec.
execute.

Save outfile='c:\temp\patient1.sav'.


*F. Accounting for end effects.

*  problem is that 6 yr old and 7 yr old patients can be matched to unused
*  8 yr old controls.

Get file='c:\temp\control1.sav'.

*  select unsed male 8 yr olds and unused female 8 yr olds.
select if ((link eq '1 8' or link eq '2 8') and pick eq 0).
execute.

*  review frequency distribution for patient file link to see how many
*  male and female 6 yr old and 7 yr old patients there are.
*  there are 9 male 6 yr old patients,
*  10 male 7 yr old patients,
*  13 female 6 yr old patients, and
*  14 female 7 yr old patients.
*  how many control remain?.
frequencies link.


*  42 8 yr old control males and 46 8 yr old control females remain. Enough.
*  now select them. renumber cases within link. basically divide cases by
*  hand, then change value of link to '1 6', '1 7', '2 6', '2 7' and
*  correct case numbering.
compute linkrec=1.
if (link eq lag(link)) linkrec=lag(linkrec)+1.
execute.

do if (link eq '1 8').
+  do if (linkrec le 9).
+     compute link='1 6'.
+     compute pick=1.
+  else if (linkrec ge 10 and linkrec le 19).
+     compute link='1 7'.
+     compute linkrec=linkrec-9.
+     compute pick=1.
+  else.
+     compute pick=0.
+  end if.
else if (link eq '2 8').
+  do if (linkrec le 13).
+     compute link='2 6'.
+     compute pick=1.
+  else if (linkrec ge 14 and linkrec le 27).
+     compute link='2 7'.
+     compute linkrec=linkrec-13.
+     compute pick=1.
+  else.
+     compute pick=0.
+  end if.
end if.
execute.
*  better visually check this for correct operation.

*  OK. select control cases to retain. then match to new patient file.
select if (pick eq 1).
execute.

match files file=*/file='c:\temp\patient1.sav'/by link linkrec.
execute.

Save outfile='c:\temp\patient2.sav'/drop ranval pick precs.


*G. Final check.

get file='c:\temp\patient2.sav'.

String psexage csexage(a3).
Compute psexage=concat(string(psex,f1.0),string(trunc(page),f2.0)).
Compute csexage=concat(string(csex,f1.0),string(trunc(cage),f2.0)).

crosstab psexage by csexage by link.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

Maguin, Eugene
In reply to this post by H-19
Heval,

The code I sent earlier today won't work for male patients. I didn't know
about the differential sex distribution in the patient file. I need more
details. Would you post to the list a crosstab of age by sex for patients
and the same for controls.

The code will work for female patients and female controls but will need
some modification because you do not need to account for sex.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

H-19
In reply to this post by Maguin, Eugene
age * gender Crosstabulation
Count
diagnose     gender  Total
   male female
control age 8 4 0 4
  9 93 80 173
  10 171 167 338
  11 212 201 413
  12 37 35 72
  13 2 3 5
  14 0 2 2
 Total  519 488 1007
asd age 3 10 2 12
  4 19 2 21
  5 20 3 23
  6 15 4 19
  7 25 4 29
  8 27 5 32
  9 32 4 36
  10 18 7 25
  11 23 3 26
  12 16 9 25
  13 25 3 28
  14 19 1 20
  15 13 1 14
  16 12 1 13
  17 3 3 6
  18 5 3 8
 Total  282 55 337

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Automate Add Value Labels using Python (or something)?

King Douglas
In reply to this post by Maguin, Eugene
Folks,

I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files.

For instance, say I have quarterly and monthly tracking survey results going back a number of years.  The survey dates are represented in three numeric variables, Year, Month, Day.

I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence.

The "Month" variable already contains value labels (Jan, Feb...Dec).  I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program.

Now that I have theoretically solved this problem, I'll get to work.  :>)
If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated.

King Douglas
American Airlines Customer Research

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

Richard Ristow
In reply to this post by Maguin, Eugene
At 10:37 AM 1/21/2008, Ozgen, H.M. wrote, initially off-list:

>I have a dataset of 400 patietns (age range: 3-18) and 1000( age
>range 8-14) controls.
>I am trying to match them with sex and +/- 1 or 2 age.

Here's a statistical question: Is this necessary, or useful?

Talk to your advisor, of course. (Not that you have any choice.)

Studies with matched controls are always expensive, either in
resources (to enroll enough extra controls to get matches for all
subjects), or in sample size (because of the enrolled controls that
aren't chosen as matches). And it's likely enough you still won't
match on all characteristics (including ones you don't know about)
that are relevant to the outcome. You don't give your full study
design, but from what you've said, I wonder whether comparing subject
and control groups with the usual inter-group ANOVA/ANCOVA/...
procedures wouldn't be at least as powerful, in addition to being much easier.

APPENDIX: Where are matched-control designs useful?

. Where you have very large populations of candidate subjects and
controls; you're willing to spend a lot of money on the study; and
you want interpretation to be unambiguous even to those who don't
know statistics.
   The example that comes to mind is the studies (in the United
States) that showed the adverse health effects of cigarette smoking.
In this case, a.) the candidate populations were huge, namely all
smokers and all non-smokers in the United States; b.) the American
Cancer Society and others were willing to spend a lot of money, as
they had it, and regarded the results as critically important; c.)
unambiguous interpretation was crucial, as tobacco-industry groups,
etc., would devote considerable resources to discrediting any positive results.

. In experimental studies that lend themselves to enrolling paired
subjects and controls
   For example, see split-plot studies in agriculture.

. Where there are many more candidate controls than there are
subjects; assessing whether a candidate control matches a subject is
easy; and gathering data on a control is expensive, so you don't
enroll all the candidates.
   I've been on two of these:
   1) A comparison of infants born to diabetic vs. non-diabetic
mothers. Candidate populations were all births at a large maternity
hospital, with diabetic mothers a small fraction of all births. It
was easy to select controls matched by near-same birth date; I don't
remember what other characteristics were matched on
   2) A comparison of urban buildings that had experienced a major
fire, with ones that had not. Choosing a geographically matched
qualified control from nearby un-burned buildings was fairly easy. In
the study design it was crucial, since the purpose was to elicit
effects other than riskiness of location.

-Onward, ever onward,
  Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

H-19
In reply to this post by Maguin, Eugene
Thank you!

You are so very right by asking whether this is necessary, or useful? I
have already discussed this issue several times with my supervisors they
insist that i should match, whereas my statistician does not agree.

In the end what i look is the odds ratio between the patients and
controls.the rest of the analyses ( e.g. clustering of the variables, 683
variables i have; for that i will not use the matchied pairs)

So i will only use the matched pairs to see the odds between these cases
and controls.

Many thanks
H

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automate Add Value Labels using Python (or something)?

Oliver, Richard
In reply to this post by King Douglas
Maybe I don't understand what you need, but if all you need is a "label" that provides month and year:

compute datevar=date.mdy(month, day, year).
formats datevar (moyr8).

The end result is that datevar values are displayed in moyr format (e.g., OCT 2007).

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas
Sent: Wednesday, January 23, 2008 11:31 AM
To: [hidden email]
Subject: Automate Add Value Labels using Python (or something)?

Folks,

I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files.

For instance, say I have quarterly and monthly tracking survey results going back a number of years.  The survey dates are represented in three numeric variables, Year, Month, Day.

I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence.

The "Month" variable already contains value labels (Jan, Feb...Dec).  I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program.

Now that I have theoretically solved this problem, I'll get to work.  :>)
If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated.

King Douglas
American Airlines Customer Research

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automate Add Value Labels using Python (or something)?

King Douglas
Thanks, Richard,

If I were to revise what I want each month, it would look something like this.

Aggregate more than two year's of data and select a 24-month period, say March 2005 through February 2007.
Compute a sequential,numeric variable (MONTHVAR) for each month such that March 2005 = 1 and February 2007 = 24.

Write a value labels statement

VALUE LABELS MONTHVAR
1 'Mar 2005'
2 'Apr 2005'
.
.
.
.
.
22 'Dec 2006'
23 'Jan 2007'
24 'Feb 2007'.

What I want to do is automate the Value Labels statement to account only for the periods contained in the data file and limit the labels to those months.

I think I've just about got it figured out using AGGREGATE and writing ADD VALUES syntax to an SPS file.  I don't think Python is necesssary or even the best way to go.

Thanks again for thinking about this,

King


"Oliver, Richard" <[hidden email]> wrote: Maybe I don't understand what you need, but if all you need is a "label" that provides month and year:

compute datevar=date.mdy(month, day, year).
formats datevar (moyr8).

The end result is that datevar values are displayed in moyr format (e.g., OCT 2007).

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas
Sent: Wednesday, January 23, 2008 11:31 AM
To: [hidden email]
Subject: Automate Add Value Labels using Python (or something)?

Folks,

I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files.

For instance, say I have quarterly and monthly tracking survey results going back a number of years.  The survey dates are represented in three numeric variables, Year, Month, Day.

I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence.

The "Month" variable already contains value labels (Jan, Feb...Dec).  I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program.

Now that I have theoretically solved this problem, I'll get to work.  :>)
If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated.

King Douglas
American Airlines Customer Research

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: matching cases-controls

Marta Garcia-Granero
In reply to this post by H-19
Hi H


You wrote

    "You are so very right by asking whether this is necessary, or useful? I
    have already discussed this issue several times with my supervisors they
    insist that i should match, whereas my statistician does not agree.


    In the end what i look is the odds ratio between the patients and

    controls.the rest of the analyses ( e.g. clustering of the variables, 683
    variables i have; for that i will not use the matchied pairs)

    So i will only use the matched pairs to see the odds between these cases
    and controls."



Are your supervisors aware of the fact that matching data complicates
the statistical analysis? You will need to use either McNemar's OR or
conditional logistic regression. My opinion: if your statistician
doesn't think it's necessary, he/she is probably quite right (I suppose
you have reached the conclusion that I'm a statistician too). Matching
is done to avoid confounding caused by factors known to affect the
outcome, but a wrong statistical analysis couls bias your results a lot.

HTH,
Marta García-Granero

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

SOLVED: Automate Add Value Labels--without Python

King Douglas
In reply to this post by Oliver, Richard
Folks,

I needed to automate a process of creating a new numeric variable, with appropriate Month-Year value labels, to represent 24 months of data where the time periods would change from time to time.  This will work for any time period that has numeric variables for Year (yyyy) and Month (1-12), even if the months are not consecutive.

Using SPSS 15 base only, the following does the trick.  I'm sure this could be simplified or done other ways, including Python.

Starting with the master data file, selected for the periods of interest:

* COPY THE "MASTER" DATA FILE (SORTED ON YEAR, MONTH).
* THE COPY WILL BE CONDENSED INTO A "LOOKUP" TABLE.

DATASET COPY LOOKUP.

DATASET ACTIVATE LOOKUP.

* SELECT ONLY THE FIRST CASE FOR EACH YEAR-MONTH PAIR.

MATCH FILES FILE = *
  /BY YEAR MONTH
  /FIRST = FYRMO.

SELECT IF FYRMO.

* CREATE A NUMERIC VARIABLE THAT WILL CONTAIN VALUES 1 TO N<100,
* REFLECTING THE NUMBER OF YEAR-MONTH PAIRS.

NUMERIC MONTHVAR (F2.0).
COMPUTE MONTHVAR = $CASENUM.

* ESTABLISH A FIRST AND LAST CASE FOR THE CONDENSED FILE.

MATCH FILES FILE = *
  /BY MONTHVAR
  /LAST = LAST
  /FIRST = FIRST.

* CREATE A STRING VARIABLE TO REPRESENT EVERY VALUE FOR THE 12-MONTH CYCLE.

STRING MONTHVAL (A4).
DO REPEAT P = 1 TO 12
  /Q = 'Jan ' 'Feb ' 'Mar ' 'Apr ' 'May ' 'Jun '
       'Jul ' 'Aug ' 'Sep ' 'Oct ' 'Nov ' 'Dec '.
IF MONTH = P MONTHVAL = Q.
END REPEAT PRINT.

* CREATE A STRING VARIABLE AS A PLACEHOLDER FOR THE
* VALUE LABELS FOR EACH OF THE YEAR-MONTH PAIRS.
* ADD THE TWO-DIGIT YEAR AFTER THE MONTH.

STRING MONTHVARVAL (A6).
COMPUTE MONTHVARVAL = CONCATENATE(MONTHVAL,STRING(YEAR-2000,N2)).

* WRITE THE VALUE LABELS TO A SYNTAX FILE.
* (MODIFIED FROM RAY LEVESQUE'S SPSSTOOLS.NET).

DO IF FIRST.
WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS"
   /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.".
ELSE IF NOT LAST.
WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS"
   /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.".
ELSE.
WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS"
   /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.".
END IF.

* KEEP ONLY THE VARIABLES ESSENTIAL TO A LOOKUP TABLE.

MATCH FILE FILE = *
  /KEEP YEAR MONTH MONTHVAR.

* ACTIVATE THE PRE-SORTED MASTER FILE AND
* MATCH IT TO THE LOOKUP TABLE.

DATASET ACTIVATE MASTER.

MATCH FILES FILE = *
  /TABLE = LOOKUP
  /BY YEAR MONTH.

* INSERT THE ADD VALUE LABELS PROGRAM AND TEST THE RESULTS.

INSERT FILE = "C:\TEMP\ADDVAL.SPS".

MATCH FILES FILE = *
  /KEEP YEAR MONTH MONTHVAR.

FREQ MONTHVAR.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas
Sent: Wednesday, January 23, 2008 11:31 AM
To: [hidden email]
Subject: Automate Add Value Labels using Python (or something)?

Folks,

I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files.

For instance, say I have quarterly and monthly tracking survey results going back a number of years.  The survey dates are represented in three numeric variables, Year, Month, Day.

I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence.

The "Month" variable already contains value labels (Jan, Feb...Dec).  I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program.

Now that I have theoretically solved this problem, I'll get to work.  :>)
If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated.

King Douglas
American Airlines Customer Research

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automate Add Value Labels using Python (or something)?

Richard Ristow
In reply to this post by King Douglas
At 12:31 PM 1/23/2008, King Douglas wrote:

>I'm working on a way to automate an Add Values statement for an
>automatically generated numeric variable for month (1 to 24) that
>represents a selected time period across a number of files.
>
>I think I could use Python to extract the labels, writing the
>results to a syntax file that I can insert into my program.

You may find native SPSS code is simpler for this. It's certainly
handled "write and insert" logic well, for a long time.

>For instance, say I have quarterly and monthly tracking survey
>results going back a number of years.  The survey dates are
>represented in three numeric variables, Year, Month, Day.
>
>I can create the new variable values (1-24) with ordinary syntax,
>but now want to automatically add value labels ("MoYr") based on the
>24-month sequence.

Since it looks like you're labeling with calendar month-year, would
assigning date format "MOYR6" serve your turn? No, I think I see why
not: your 'new variable' is the sequential month within the survey,
running 1-24. (Right; that's what you responded to Richard Oliver.)

>The "Month" variable already contains value labels (Jan,
>Feb...Dec).  I think I could use Python to extract the labels and
>simply append the year, writing the results to a syntax file that I
>can insert into my program.
>
>Now that I have theoretically solved this problem, I'll get to work.  :>)
>If anyone has any handy code or other suggestions while I'm working
>at it, that would be much appreciated.

Below is a native-SPSS solution. OK, here's to me, for thinking
code-generating code could be 'simple'. Python has the advantage of
direct access to the data dictionary, including the value labels.

On the other hand, the Python solution may be somewhat complicated,
because the mapping of survey month to calendar month is an attribute
of the data, not the data dictionary.

Anyhow, see how you like this. Test data, and code without
interspersed output lines, follow the listing.


|-----------------------------|---------------------------|
|Output Created               |23-JAN-2008 19:08:09       |
|-----------------------------|---------------------------|
[TestData]

CaseID        Date Year Month  Day SvyMonth

   001  12-MAY-2003 2003     5   12      1
   002  21-JUN-2003 2003     6   21      2
   003  30-JUN-2003 2003     6   30      2
   004  16-JUL-2003 2003     7   16      3
   005  14-SEP-2003 2003     9   14      5
   006  29-SEP-2003 2003     9   29      5
   007  03-OCT-2003 2003    10    3      6
   008  08-OCT-2003 2003    10    8      6
   009  03-NOV-2003 2003    11    3      7
   010  10-NOV-2003 2003    11   10      7
   011  13-DEC-2003 2003    12   13      8
   012  17-DEC-2003 2003    12   17      8
   013  23-DEC-2003 2003    12   23      8
   014  23-JAN-2004 2004     1   23      9
   015  01-FEB-2004 2004     2    1     10
   016  01-FEB-2004 2004     2    1     10
   017  05-FEB-2004 2004     2    5     10
   018  11-FEB-2004 2004     2   11     10
   019  17-FEB-2004 2004     2   17     10
   020  23-FEB-2004 2004     2   23     10
   021  02-APR-2004 2004     4    2     12
   022  14-APR-2004 2004     4   14     12
   023  14-APR-2004 2004     4   14     12
   024  04-MAY-2004 2004     5    4     13
   025  21-MAY-2004 2004     5   21     13
   026  04-JUN-2004 2004     6    4     14
   027  22-JUN-2004 2004     6   22     14
   028  09-JUL-2004 2004     7    9     15
   029  05-OCT-2004 2004    10    5     18
   030  28-OCT-2004 2004    10   28     18
   031  31-OCT-2004 2004    10   31     18
   032  03-NOV-2004 2004    11    3     19
   033  09-NOV-2004 2004    11    9     19
   034  16-DEC-2004 2004    12   16     20
   035  11-JAN-2005 2005     1   11     21
   036  20-FEB-2005 2005     2   20     22
   037  18-APR-2005 2005     4   18     24
   038  29-APR-2005 2005     4   29     24
   039  07-MAY-2005 2005     5    7     25

Number of cases read:  39    Number of cases listed:  39


*  .................   Post after this point   ..................... .
*  ................................................................. .
DATASET ACTIVATE TestData WINDOW=FRONT  /* Helpful, during testing */.

*  I.     PREPARE A FILE from which to generate code    ............ .

*  I.A    Add a dummy record for every month in the survey, to make  .
*         sure all months are represented.                           .

NEW  FILE.
INPUT PROGRAM.
.   LOOP  SvyMonth = 1 TO 24.
.      END CASE.
.   END LOOP.
END FILE.
END INPUT PROGRAM.

ADD FILES
   /FILE=TestData
   /FILE=*.

*  I.B    Create a file with one record for every survey month.      .
*         (It retains one survey date and one calendar-month value,  .
*         so the value label can be built either using the existing  .
*         label for 'Month', or the SPSS 'MONTHn' format.)           .

DATASET   DECLARE CodeGen.
AGGREGATE OUTFILE=CodeGen
    /BREAK=SvyMonth
    /FrstDate 'To have one date value in the result'        = MIN(Date)
    /CalndrMn 'To have the old "Month" value in the result' = MIN(Month)
    /NResps   'Number of responses - I''m always curious'   = NU.

DATASET ACTIVATE CodeGen  WINDOW=FRONT.
*  (The next line has nothing to do with the purpose of this code)   .
.  COMPUTE NResps = NResps - 1  /* Discount the dummy record       */.

*  Check an error that baffled me for a while.                ...... .
*  For resolution, see I.C., below.                           ...... .
.  /*--  DISPLAY DICTIONARY /VARIABLES=CalndrMn  /*-*/.

.  /*--  TEMPORARY                               /*-*/.
.  /*--  STRING  OldLabel (A5)                   /*-*/.
.  /*--  COMPUTE OldLabel=VALUELABEL(CalndrMn)   /*-*/.
.  /*--  LIST                                    /*-*/.


*  I.C    Fill in "date" and calendar month, for survey months with  .
*         no observed responses.                                     .

NUMERIC NoBreak  (F2).
COMPUTE NoBreak =  1.


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=NOBREAK
    /Ur_Month 'First survey month with any response' = MIN(SvyMonth)
    /Ur_Date  'The very earliest response date'      = MIN(FrstDate).

NUMERIC #Ur_Day1 (DATE11) /* First day of first response month     */
         #SvyDay1 (DATE11) /* First day of first month of survey    */.

DO IF   MISSING(FrstDate).
.  COMPUTE #Ur_Day1 = DATE.MDY(XDATE.MONTH(Ur_Date),
                                1,
                                XDATE.YEAR (Ur_Date)).
*  Filled-in date is the first day of the survey month      ....... .
.  COMPUTE #SvyDay1 = DATESUM(#Ur_Day1,(1-Ur_Month),"Months").
.  COMPUTE FrstDate = DATESUM(#Ur_Day1,(SvyMonth-1),"Months").
.  COMPUTE CalndrMn = XDATE.MONTH(FrstDate).
END IF.

.  /**/  LIST  /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |23-JAN-2008 19:08:12       |
|-----------------------------|---------------------------|
[CodeGen]

SvyMonth    FrstDate CalndrMn  NResps NoBreak Ur_Month     Ur_Date

      1   12-MAY-2003      5         1     1        1   12-MAY-2003
      2   21-JUN-2003      6         2     1        1   12-MAY-2003
      3   16-JUL-2003      7         1     1        1   12-MAY-2003
      4   01-AUG-2003      8         0     1        1   12-MAY-2003
      5   14-SEP-2003      9         2     1        1   12-MAY-2003
      6   03-OCT-2003     10         2     1        1   12-MAY-2003
      7   03-NOV-2003     11         2     1        1   12-MAY-2003
      8   13-DEC-2003     12         3     1        1   12-MAY-2003
      9   23-JAN-2004      1         1     1        1   12-MAY-2003
     10   01-FEB-2004      2         6     1        1   12-MAY-2003
     11   01-MAR-2004      3         0     1        1   12-MAY-2003
     12   02-APR-2004      4         3     1        1   12-MAY-2003
     13   04-MAY-2004      5         2     1        1   12-MAY-2003
     14   04-JUN-2004      6         2     1        1   12-MAY-2003
     15   09-JUL-2004      7         1     1        1   12-MAY-2003
     16   01-AUG-2004      8         0     1        1   12-MAY-2003
     17   01-SEP-2004      9         0     1        1   12-MAY-2003
     18   05-OCT-2004     10         3     1        1   12-MAY-2003
     19   03-NOV-2004     11         2     1        1   12-MAY-2003
     20   16-DEC-2004     12         1     1        1   12-MAY-2003
     21   11-JAN-2005      1         1     1        1   12-MAY-2003
     22   20-FEB-2005      2         1     1        1   12-MAY-2003
     23   01-MAR-2005      3         0     1        1   12-MAY-2003
     24   18-APR-2005      4         2     1        1   12-MAY-2003
     25   07-MAY-2005      5         0     1        1   12-MAY-2003

Number of cases read:  25    Number of cases listed:  25


*  II.    GENERATE THE CODE, and write to the external file          .

ADD FILES
   /FILE = *
   /BY     NoBreak
   /FIRST= ItBegins
   /LAST = ItEnds.


STRING #Syntax (A60) /* To build the output line in */.

DO IF   ItBegins     /* The command header line:    */.
.  COMPUTE #Syntax = "ADD VALUE LABELS SvyMonth".
.  WRITE OUTFILE=Syntax
                /#Syntax.
END IF.

*  ....              /* The label   lines:          */.
** (In quest for clearer code, this uses more       **.
** variables and lines than strictly necessary.)    **.

STRING  #Label (A5)
         #LblMn (A3)
         #LblYr (A4).

*  Month name for label, using SPSS "MONTHn" format **.
*xxCOMPUTE #LblMn= STRING(FrstDate,MONTH3).

*  Month name for label, using existing value labels**.
+  COMPUTE #LblMn= VALUELABEL(CalndrMn).

*  Two-digit year, for label                        **.
COMPUTE #LblYr=    STRING(XDATE.YEAR(FrstDate),N4).
COMPUTE #LblYr=    SUBSTR(#LblYr,3).

*  The complete label text:                         **.
COMPUTE #Label=    CONCAT(#LblMn,#LblYr).

COMPUTE #Syntax
          = CONCAT("      "           ,
                   STRING(SvyMonth,F3),
                   "     '"           ,
                   #Label             ,
                   "'"                ).


DO IF   ItEnds       /* Add period to the last line:*/.
.  COMPUTE #Syntax = CONCAT(RTRIM(#Syntax),".").
END IF.

+  WRITE OUTFILE=Syntax / #Syntax.

EXECUTE   /* An instance where EXECUTE is needed                   */.



*  III.   TRACE AND TEST: List the generated code       ............ .

+  /**/   DATA LIST FIXED            /*-*/
    /**/        FILE=Syntax /         /*-*/
    /**/        CodeLine    01-60(A)  /*-*/.

Data List will read 1 records from C:\Documents and Settings\...hon -
SYNTAX.SAV

Variable          Rec   Start     End  Format

CodeLine            1       1      60  A60


+  /**/   LIST                       /*-*/.

List
|-----------------------------|---------------------------|
|Output Created               |23-JAN-2008 19:08:14       |
|-----------------------------|---------------------------|
CodeLine

ADD VALUE LABELS SvyMonth
         1     'May03'
         2     'Jun03'
         3     'Jul03'
         4     'Aug03'
         5     'Sep03'
         6     'Oct03'
         7     'Nov03'
         8     'Dec03'
         9     'Jan04'
        10     'Feb04'
        11     'Mar04'
        12     'Apr04'
        13     'May04'
        14     'Jun04'
        15     'Jul04'
        16     'Aug04'
        17     'Sep04'
        18     'Oct04'
        19     'Nov04'
        20     'Dec04'
        21     'Jan05'
        22     'Feb05'
        23     'Mar05'
        24     'Apr05'
        25     'May05'.


Number of cases read:  26    Number of cases listed:  26


*  IV.    INCLUDE AND EXECUTE the code                  ............ .

DATASET ACTIVATE TestData.
INSERT  File=Syntax.
  227  ADD VALUE LABELS SvyMonth
  228          1     'May03'
  229          2     'Jun03'
  230          3     'Jul03'
  231          4     'Aug03'
  232          5     'Sep03'
  233          6     'Oct03'
  234          7     'Nov03'
  235          8     'Dec03'
  236          9     'Jan04'
  237         10     'Feb04'
  238         11     'Mar04'
  239         12     'Apr04'
  240         13     'May04'
  241         14     'Jun04'
  242         15     'Jul04'
  243         16     'Aug04'
  244         17     'Sep04'
  245         18     'Oct04'
  246         19     'Nov04'
  247         20     'Dec04'
  248         21     'Jan05'
  249         22     'Feb05'
  250         23     'Mar05'
  251         24     'Apr05'
  252         25     'May05'.
  253
  254  * End of INSERT and INCLUDE nesting level 01.

DISPLAY DICTIONARY /VARIABLES=SvyMonth.

File Information
|-----------------------------|---------------------------|
|Output Created               |23-JAN-2008 19:08:14       |
|-----------------------------|---------------------------|
[TestData]

Variable Information
|--------|--------|------|------------|------|---------|------|------|
|Variable|Position|Label |Measurement |Column|Alignment|Print |Write |
|        |        |      |Level       |Width |         |Format|Format|
|--------|--------|------|------------|------|---------|------|------|
|SvyMonth|6       |<none>|Scale       |10    |Right    |F3    |F3    |
|--------|--------|------|------------|------|---------|------|------|
Variables in the working file

Variable Values
|--------|--|-----|
|Value   |  |Label|
|--------|--|-----|
|SvyMonth|1 |May03|
|        |2 |Jun03|
|        |3 |Jul03|
|        |4 |Aug03|
|        |5 |Sep03|
|        |6 |Oct03|
|        |7 |Nov03|
|        |8 |Dec03|
|        |9 |Jan04|
|        |10|Feb04|
|        |11|Mar04|
|        |12|Apr04|
|        |13|May04|
|        |14|Jun04|
|        |15|Jul04|
|        |16|Aug04|
|        |17|Sep04|
|        |18|Oct04|
|        |19|Nov04|
|        |20|Dec04|
|        |21|Jan05|
|        |22|Feb05|
|        |23|Mar05|
|        |24|Apr05|
|        |25|May05|
|--------|--|-----|

===================================
APPENDIX: Code, including test data
===================================
*  ................................................................. .
*  .................   Test data and file handle   ................. .

SET RNG = MT       /* 'Mersenne twister' random number generator  */ .
SET MTINDEX
         = 01232008 /*  Date of posting, mmddyyyy                  */ .

FILE HANDLE Syntax
  /NAME='C:\Documents and Settings\Richard\My Documents'              +
          '\Temporary\SPSS\'                                          +
        '2008-01-23 Douglas - Automate Add Value Labels using Python' +
        ' - '                                                         +
        'SYNTAX.SAV'.
NEW  FILE.
INPUT PROGRAM.
.  NUMERIC CaseID  (N3).
.  NUMERIC Date    (DATE11)
            Year     Month    Day (F4).
.  VALUE LABELS     Month
    1 'Jan'   2 'Feb'   3 'Mar'   4 'Apr'   5 'May'   6 'Jun'
    7 'Jul'   8 'Aug'   9 'Sep'  10 'Oct'  11 'Nov'  12 'Dec'.
.  LEAVE   CaseID   Date.
.  COMPUTE Date   = DATE.MDY(05,12,2003).
.  LOOP CaseID    = 1 TO 500
         IF Date  LE DATE.MDY(05,12,2005).
.     COMPUTE Year  = XDATE.YEAR (Date).
.     COMPUTE Month = XDATE.MONTH(Date).
.     COMPUTE Day   = XDATE.MDAY (Date).
.     END CASE.
.     COMPUTE
            Date     = XDATE.DATE(Date+RV.EXP(1/TIME.DAYS(20))).
.  END LOOP.
END FILE.
END INPUT PROGRAM.

DATASET NAME     TestData WINDOW=FRONT.


*  Ah! The 'month' variable you want to label is not calendar date,  .
*  but month 01-24 in the survey period.                             .

NUMERIC  #SvyStrt (DATE11).
COMPUTE  #SvyStrt= DATE.MDY(05,01,2003).
NUMERIC  SvyMonth (F3).
COMPUTE  SvyMonth = 1 + DATEDIFF(Date,#SvyStrt,"Months").

.  /**/  LIST  /*-*/.

*  .................   Post after this point   ..................... .
*  ................................................................. .
DATASET ACTIVATE TestData WINDOW=FRONT  /* Helpful, during testing */.

*  I.     PREPARE A FILE from which to generate code    ............ .

*  I.A    Add a dummy record for every month in the survey, to make  .
*         sure all months are represented.                           .

NEW  FILE.
INPUT PROGRAM.
.   LOOP  SvyMonth = 1 TO 24.
.      END CASE.
.   END LOOP.
END FILE.
END INPUT PROGRAM.

ADD FILES
   /FILE=TestData
   /FILE=*.

*  I.B    Create a file with one record for every survey month.      .
*         (It retains one survey date and one calendar-month value,  .
*         so the value label can be built either using the existing  .
*         label for 'Month', or the SPSS 'MONTHn' format.)           .

DATASET   DECLARE CodeGen.
AGGREGATE OUTFILE=CodeGen
    /BREAK=SvyMonth
    /FrstDate 'To have one date value in the result'        = MIN(Date)
    /CalndrMn 'To have the old "Month" value in the result' = MIN(Month)
    /NResps   'Number of responses - I''m always curious'   = NU.

DATASET ACTIVATE CodeGen  WINDOW=FRONT.

*  (The next line has nothing to do with the purpose of this code)   .
.  COMPUTE NResps = NResps - 1  /* Discount the dummy record       */.

*  Check an error that baffled me for a while.                ...... .
*  For resolution, see I.C., below.                           ...... .
.  /*--  DISPLAY DICTIONARY /VARIABLES=CalndrMn  /*-*/.

.  /*--  TEMPORARY                               /*-*/.
.  /*--  STRING  OldLabel (A5)                   /*-*/.
.  /*--  COMPUTE OldLabel=VALUELABEL(CalndrMn)   /*-*/.
.  /*--  LIST                                    /*-*/.


*  I.C    Fill in "date" and calendar month, for survey months with  .
*         no observed responses.                                     .

NUMERIC NoBreak  (F2).
COMPUTE NoBreak =  1.


AGGREGATE OUTFILE=* MODE=ADDVARIABLES
    /BREAK=NOBREAK
    /Ur_Month 'First survey month with any response' = MIN(SvyMonth)
    /Ur_Date  'The very earliest response date'      = MIN(FrstDate).

NUMERIC #Ur_Day1 (DATE11) /* First day of first response month     */
         #SvyDay1 (DATE11) /* First day of first month of survey    */.

DO IF   MISSING(FrstDate).
.  COMPUTE #Ur_Day1 = DATE.MDY(XDATE.MONTH(Ur_Date),
                                1,
                                XDATE.YEAR (Ur_Date)).
*  Filled-in date is the first day of the survey month      ....... .
.  COMPUTE #SvyDay1 = DATESUM(#Ur_Day1,(1-Ur_Month),"Months").
.  COMPUTE FrstDate = DATESUM(#Ur_Day1,(SvyMonth-1),"Months").
.  COMPUTE CalndrMn = XDATE.MONTH(FrstDate).
END IF.

.  /**/  LIST  /*-*/.

*  II.    GENERATE THE CODE, and write to the external file          .

ADD FILES
   /FILE = *
   /BY     NoBreak
   /FIRST= ItBegins
   /LAST = ItEnds.


STRING #Syntax (A60) /* To build the output line in */.

DO IF   ItBegins     /* The command header line:    */.
.  COMPUTE #Syntax = "ADD VALUE LABELS SvyMonth".
.  WRITE OUTFILE=Syntax
                /#Syntax.
END IF.

*  ....              /* The label   lines:          */.
** (In quest for clearer code, this uses more       **.
** variables and lines than strictly necessary.)    **.

STRING  #Label (A5)
         #LblMn (A3)
         #LblYr (A4).

*  Month name for label, using SPSS "MONTHn" format **.
*xxCOMPUTE #LblMn= STRING(FrstDate,MONTH3).

*  Month name for label, using existing value labels**.
+  COMPUTE #LblMn= VALUELABEL(CalndrMn).

*  Two-digit year, for label                        **.
COMPUTE #LblYr=    STRING(XDATE.YEAR(FrstDate),N4).
COMPUTE #LblYr=    SUBSTR(#LblYr,3).

*  The complete label text:                         **.
COMPUTE #Label=    CONCAT(#LblMn,#LblYr).

COMPUTE #Syntax
          = CONCAT("      "           ,
                   STRING(SvyMonth,F3),
                   "     '"           ,
                   #Label             ,
                   "'"                ).


DO IF   ItEnds       /* Add period to the last line:*/.
.  COMPUTE #Syntax = CONCAT(RTRIM(#Syntax),".").
END IF.

+  WRITE OUTFILE=Syntax / #Syntax.

EXECUTE   /* An instance where EXECUTE is needed                   */.



*  III.   TRACE AND TEST: List the generated code       ............ .

+  /**/   DATA LIST FIXED            /*-*/
    /**/        FILE=Syntax /         /*-*/
    /**/        CodeLine    01-60(A)  /*-*/.

+  /**/   LIST                       /*-*/.


*  IV.    INCLUDE AND EXECUTE the code                  ............ .

DATASET ACTIVATE TestData.

INSERT  File=Syntax.

DISPLAY DICTIONARY /VARIABLES=SvyMonth.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automate Add Value Labels using Python (or something)?

hillel vardi
In reply to this post by King Douglas
Shalom

Here is another approach witch is using the autorecode   apply  template
feature .
The program generate a SPSS-format data file (see the spss manual under
Template File Format in the autorecode command).
The manual also state that creating SPSS-format file is not tested .
After creating the file with the spss format you can apply it to any
variable using the autorecode command .
As you can see the syntax is vary short .
To test the program change the file name in the get command to any file
that have date variable and rename that variable labdate .


title     "Automate Add Value Labels" .
*** use file with dates to get first and last date ***** ,
get           file='P:\Project\sprinkles\spss\lab.sav'  /keep=tz labdate  .
compute  dummi=1.
AGGREGATE  /OUTFILE=*  /BREAK=dummi
  /mindate = MIN(labdate) /maxdate = MAX(labdate).
compute      last_my=datediff(maxdate,mindate,'month').
string   strdate(a6) .
loop   i=1 to last_my .
COMPUTE tmpdate =DATESUM(mindate, i, "months", "closest").
compute strdate=string(tmpdate,MOYR6) .
**** save a template file **** .
xsave     outfile='c:\temp\var1_template.sat'/
           rename( strdate i = Source_ Target_ )  / keep=Source_ Target_ .
xsave     outfile='c:\temp\dates1.sav'/  keep=strdate i .
end loop.
execute .
get      file='c:\temp\dates1.sav' .
****  apply the template to a variable *** .
autorecode  strdate / into idate /applay
TEMPLATE='c:\temp\var1_template.sat'.
*** see the generated value for idate in the variable view window ***** .

Hillel vardi
BGU


King Douglas wrote:

> Folks,
>
> I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files.
>
> For instance, say I have quarterly and monthly tracking survey results going back a number of years.  The survey dates are represented in three numeric variables, Year, Month, Day.
>
> I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence.
>
> The "Month" variable already contains value labels (Jan, Feb...Dec).  I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program.
>
> Now that I have theoretically solved this problem, I'll get to work.  :>)
> If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated.
>
> King Douglas
> American Airlines Customer Research
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD