|
This is an off-list message from H. Ozgen [[hidden email]]. If you
reply off-list, please do so to him/her and not me. Better yet, reply to the list. I have a dataset of 400 patietns (age range: 3-18) and 1000( age range 8-14) controls. I am trying to match them with sex and +/- 1 or 2 age. how can I do that? is there any syntax for this? any suggestions welcome. THANKS in advance for considering this problem. H Ozgen Heval Ozgen, MD Department of Child and Adolescent Psychiatry University Medical Center, Heidelberglaan 100 3508 GA Utrecht HP B 01.324 tel.: ++ 31 30 250 6362 fax.:++ 31 30 250 5444 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 10:37 AM 1/21/2008, Ozgen, H.M. wrote, off-list:
[...] >THANKS in advance for considering this problem. I join with Gene Maguin in feeling that questions, and responses, should usually go to the whole list. + Posting invites all interested list members to apply their knowledge and energy to the problem. + Posted questions and responses can inform everyone on the list; off-list questions and responses inform only one person. In that spirit, continuing on-list, >I have a dataset of 400 patietns (age range: 3-18) and 1000( age >range 8-14) controls. >I am trying to match them with sex and +/- 1 or 2 age. First, a simple question: if your patients' ages are from 3-18 and your controls are all 8 years of age and over, how will you match those patients less than 8 years old? Finding best matches between cases and candidate controls by comparing all possible pairs is a many-to-many merge, somewhat awkward. But it may be much easier, in your case. With those sizes, you probably have many cases and controls of both sexes, for every year of age. It could reduce to randomly selecting as many controls as you need for each sex-year combination, and randomly assigning those to the cases with the same sex and year of age. The two random operations can be done as one, using a random sort. Does this get you any closer? -With best wishes, Richard Ristow ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Thank you.
in the patient group sex ratio is 4/1 ( M/f) whereas in the control group that is 1/1. For age matching i can go +/- 1 if that will give me too less patients then may try +/-2. and for your question: how will i match the patients aged less than 8: i can still match those are 6 or 7 yrs old, but the rest i loose.. this is also true for age 16 and above. H ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Richard Ristow
All,
This is a reply to H. M. Ozgen who several days ago posted this question: >>I have a dataset of 400 patients (age range: 3-18) and 1000( age range 8-14) controls. I am trying to match them with sex and +/- 1 or 2 age. Below is the syntax file I sent that creates two example data files meeting his specifications and the syntax necessary to perform the matching operation. Note that because of Ozgen's descriptions of the two samples and my assumptions in building the example datasets, the solution was generally simple. The simplicity came from the fact that there was an excess of controls at every sex-age combination, which allowed me to have an exact age match, except for the 6 and 7 year old patient groups. The 6 and 7 year old patient groups imposed complexity because they could be matched with 8 year old controls of the same sex. The programming complexity would have been much greater had there been insufficient numbers of controls for an exact sex-age match. Lastly, I welcome comments or questions on my code so that I can improve my understanding of this type of problem. **********************************************************. **********************************************************. * I'll assume you have patients in one file, called patients, and * controls in another file, called controls. * The key variables are id, age, sex. They are named as follows: * Patient file: id=cid, age=page, sex=psex (1=male, 2=female). * Control file: id=cid, age=cage, sex=csex (1=male, 2=female). * preliminary. create example data files. input program. /* patient file. loop #i=1 to 400. + compute pid=#i. + compute psex=trunc(uniform(2))+1. + compute page=trunc(uniform(16))+3. end case. end loop. end file. end input program. format pid(f3.0) psex(F1.0) page(f2.0). frequencies pid psex page. save outfile='c:\temp\patient.sav'. input program. /* control file. loop #i=1 to 1000. + compute cid=#i. + compute csex=trunc(uniform(2))+1. + compute cage=trunc(uniform(7))+8. end case. end loop. end file. end input program. format cid(f3.0) csex(F1.0) cage(f2.0). frequencies cid csex cage. save outfile='c:\temp\control.sav'. **************************************************. **************************************************. * Below here begins the matching operations. **************************************************. *A. Patient file activities. Get file='c:\temp\patient.sav'. * Make a string composite sex plus age variable. String link(a3). Compute link=concat(string(psex,f1.0),string(trunc(page),f2.0)). Frequencies link. /* to see how many combinations you have. * Aggregate and save file as count of cases needing a match. Aggregate outfile='c:\temp\patientagg1.sav'/break=link/precs=nu. execute. *B. Control file activities. Get file='c:\temp\control.sav'. * Make a string composite sex plus age variable. String link(a3). Compute link=concat(string(csex,f1.0),string(trunc(cage),f2.0)). Frequencies link. /* to see how many combinations you have. * Aggregate and save file as count of cases needing a match. Aggregate outfile='c:\temp\controlagg1.sav'/break=link/crecs=nu. execute. *C. Assessing the matching prospects. * This is the critical operation as you see for the first * time what the problems will be. What the crosstabs show * is the number of patient cases and the number of control * cases for each value of the matching variables. match files file='c:\temp\patientagg1.sav'/ file='c:\temp\controlagg1.sav'/by link. crosstabs precs by crecs by link. * Conclusion: in every case the number control records available for * matching is greater than the number of patient records needing a * match. This is best possibole outcome. However, will need to consider * distribution end effects for lower end of patient distribution. * D. Back to control file. Get file='c:\temp\control.sav'. * Make a string composite sex plus age variable. String link(a3). Compute link=concat(string(csex,f1.0),string(trunc(cage),f2.0)). Frequencies link. /* to see how many combinations you have. Sort cases by link. * Match the aggregated patient file to the control file. match files file=*/table='c:\temp\patientagg1.sav'/by link. execute. * Draw a number from a uniform distribution, sort by link and random * value, then number cases within link value. compute ranval=uniform(1). sort cases by link ranval. compute linkrec=1. if (link eq lag(link)) linkrec=lag(linkrec)+1. execute. * keep the record if the rec number is less than equal to number of patient * records for that link value. compute pick=0. if (linkrec le precs) pick=1. execute. * save control file for later use for distribution end effects. * then select and save records needed for matching. Save outfile='c:\temp\control1.sav'. select if (pick eq 1). execute. Save outfile='c:\temp\controlmatch1.sav'/drop ranval pick precs. *E. Back to Patient file. Get file='c:\temp\patient.sav'. * Make a string composite sex plus age variable. String link(a3). Compute link=concat(string(psex,f1.0),string(trunc(page),f2.0)). * sort cases by link and then number (use same variable names). sort cases by link. compute linkrec=1. if (link eq lag(link)) linkrec=lag(linkrec)+1. execute. * finally, match this file with the selected control cases. match files file=*/file='c:\temp\controlmatch1.sav'/by link linkrec. execute. Save outfile='c:\temp\patient1.sav'. *F. Accounting for end effects. * problem is that 6 yr old and 7 yr old patients can be matched to unused * 8 yr old controls. Get file='c:\temp\control1.sav'. * select unsed male 8 yr olds and unused female 8 yr olds. select if ((link eq '1 8' or link eq '2 8') and pick eq 0). execute. * review frequency distribution for patient file link to see how many * male and female 6 yr old and 7 yr old patients there are. * there are 9 male 6 yr old patients, * 10 male 7 yr old patients, * 13 female 6 yr old patients, and * 14 female 7 yr old patients. * how many control remain?. frequencies link. * 42 8 yr old control males and 46 8 yr old control females remain. Enough. * now select them. renumber cases within link. basically divide cases by * hand, then change value of link to '1 6', '1 7', '2 6', '2 7' and * correct case numbering. compute linkrec=1. if (link eq lag(link)) linkrec=lag(linkrec)+1. execute. do if (link eq '1 8'). + do if (linkrec le 9). + compute link='1 6'. + compute pick=1. + else if (linkrec ge 10 and linkrec le 19). + compute link='1 7'. + compute linkrec=linkrec-9. + compute pick=1. + else. + compute pick=0. + end if. else if (link eq '2 8'). + do if (linkrec le 13). + compute link='2 6'. + compute pick=1. + else if (linkrec ge 14 and linkrec le 27). + compute link='2 7'. + compute linkrec=linkrec-13. + compute pick=1. + else. + compute pick=0. + end if. end if. execute. * better visually check this for correct operation. * OK. select control cases to retain. then match to new patient file. select if (pick eq 1). execute. match files file=*/file='c:\temp\patient1.sav'/by link linkrec. execute. Save outfile='c:\temp\patient2.sav'/drop ranval pick precs. *G. Final check. get file='c:\temp\patient2.sav'. String psexage csexage(a3). Compute psexage=concat(string(psex,f1.0),string(trunc(page),f2.0)). Compute csexage=concat(string(csex,f1.0),string(trunc(cage),f2.0)). crosstab psexage by csexage by link. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by H-19
Heval,
The code I sent earlier today won't work for male patients. I didn't know about the differential sex distribution in the patient file. I need more details. Would you post to the list a crosstab of age by sex for patients and the same for controls. The code will work for female patients and female controls but will need some modification because you do not need to account for sex. Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
age * gender Crosstabulation
Count diagnose gender Total male female control age 8 4 0 4 9 93 80 173 10 171 167 338 11 212 201 413 12 37 35 72 13 2 3 5 14 0 2 2 Total 519 488 1007 asd age 3 10 2 12 4 19 2 21 5 20 3 23 6 15 4 19 7 25 4 29 8 27 5 32 9 32 4 36 10 18 7 25 11 23 3 26 12 16 9 25 13 25 3 28 14 19 1 20 15 13 1 14 16 12 1 13 17 3 3 6 18 5 3 8 Total 282 55 337 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Folks,
I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files. For instance, say I have quarterly and monthly tracking survey results going back a number of years. The survey dates are represented in three numeric variables, Year, Month, Day. I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence. The "Month" variable already contains value labels (Jan, Feb...Dec). I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program. Now that I have theoretically solved this problem, I'll get to work. :>) If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated. King Douglas American Airlines Customer Research ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
At 10:37 AM 1/21/2008, Ozgen, H.M. wrote, initially off-list:
>I have a dataset of 400 patietns (age range: 3-18) and 1000( age >range 8-14) controls. >I am trying to match them with sex and +/- 1 or 2 age. Here's a statistical question: Is this necessary, or useful? Talk to your advisor, of course. (Not that you have any choice.) Studies with matched controls are always expensive, either in resources (to enroll enough extra controls to get matches for all subjects), or in sample size (because of the enrolled controls that aren't chosen as matches). And it's likely enough you still won't match on all characteristics (including ones you don't know about) that are relevant to the outcome. You don't give your full study design, but from what you've said, I wonder whether comparing subject and control groups with the usual inter-group ANOVA/ANCOVA/... procedures wouldn't be at least as powerful, in addition to being much easier. APPENDIX: Where are matched-control designs useful? . Where you have very large populations of candidate subjects and controls; you're willing to spend a lot of money on the study; and you want interpretation to be unambiguous even to those who don't know statistics. The example that comes to mind is the studies (in the United States) that showed the adverse health effects of cigarette smoking. In this case, a.) the candidate populations were huge, namely all smokers and all non-smokers in the United States; b.) the American Cancer Society and others were willing to spend a lot of money, as they had it, and regarded the results as critically important; c.) unambiguous interpretation was crucial, as tobacco-industry groups, etc., would devote considerable resources to discrediting any positive results. . In experimental studies that lend themselves to enrolling paired subjects and controls For example, see split-plot studies in agriculture. . Where there are many more candidate controls than there are subjects; assessing whether a candidate control matches a subject is easy; and gathering data on a control is expensive, so you don't enroll all the candidates. I've been on two of these: 1) A comparison of infants born to diabetic vs. non-diabetic mothers. Candidate populations were all births at a large maternity hospital, with diabetic mothers a small fraction of all births. It was easy to select controls matched by near-same birth date; I don't remember what other characteristics were matched on 2) A comparison of urban buildings that had experienced a major fire, with ones that had not. Choosing a geographically matched qualified control from nearby un-burned buildings was fairly easy. In the study design it was crucial, since the purpose was to elicit effects other than riskiness of location. -Onward, ever onward, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
Thank you!
You are so very right by asking whether this is necessary, or useful? I have already discussed this issue several times with my supervisors they insist that i should match, whereas my statistician does not agree. In the end what i look is the odds ratio between the patients and controls.the rest of the analyses ( e.g. clustering of the variables, 683 variables i have; for that i will not use the matchied pairs) So i will only use the matched pairs to see the odds between these cases and controls. Many thanks H ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by King Douglas
Maybe I don't understand what you need, but if all you need is a "label" that provides month and year:
compute datevar=date.mdy(month, day, year). formats datevar (moyr8). The end result is that datevar values are displayed in moyr format (e.g., OCT 2007). -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas Sent: Wednesday, January 23, 2008 11:31 AM To: [hidden email] Subject: Automate Add Value Labels using Python (or something)? Folks, I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files. For instance, say I have quarterly and monthly tracking survey results going back a number of years. The survey dates are represented in three numeric variables, Year, Month, Day. I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence. The "Month" variable already contains value labels (Jan, Feb...Dec). I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program. Now that I have theoretically solved this problem, I'll get to work. :>) If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated. King Douglas American Airlines Customer Research ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks, Richard,
If I were to revise what I want each month, it would look something like this. Aggregate more than two year's of data and select a 24-month period, say March 2005 through February 2007. Compute a sequential,numeric variable (MONTHVAR) for each month such that March 2005 = 1 and February 2007 = 24. Write a value labels statement VALUE LABELS MONTHVAR 1 'Mar 2005' 2 'Apr 2005' . . . . . 22 'Dec 2006' 23 'Jan 2007' 24 'Feb 2007'. What I want to do is automate the Value Labels statement to account only for the periods contained in the data file and limit the labels to those months. I think I've just about got it figured out using AGGREGATE and writing ADD VALUES syntax to an SPS file. I don't think Python is necesssary or even the best way to go. Thanks again for thinking about this, King "Oliver, Richard" <[hidden email]> wrote: Maybe I don't understand what you need, but if all you need is a "label" that provides month and year: compute datevar=date.mdy(month, day, year). formats datevar (moyr8). The end result is that datevar values are displayed in moyr format (e.g., OCT 2007). -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas Sent: Wednesday, January 23, 2008 11:31 AM To: [hidden email] Subject: Automate Add Value Labels using Python (or something)? Folks, I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files. For instance, say I have quarterly and monthly tracking survey results going back a number of years. The survey dates are represented in three numeric variables, Year, Month, Day. I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence. The "Month" variable already contains value labels (Jan, Feb...Dec). I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program. Now that I have theoretically solved this problem, I'll get to work. :>) If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated. King Douglas American Airlines Customer Research ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by H-19
Hi H
You wrote "You are so very right by asking whether this is necessary, or useful? I have already discussed this issue several times with my supervisors they insist that i should match, whereas my statistician does not agree. In the end what i look is the odds ratio between the patients and controls.the rest of the analyses ( e.g. clustering of the variables, 683 variables i have; for that i will not use the matchied pairs) So i will only use the matched pairs to see the odds between these cases and controls." Are your supervisors aware of the fact that matching data complicates the statistical analysis? You will need to use either McNemar's OR or conditional logistic regression. My opinion: if your statistician doesn't think it's necessary, he/she is probably quite right (I suppose you have reached the conclusion that I'm a statistician too). Matching is done to avoid confounding caused by factors known to affect the outcome, but a wrong statistical analysis couls bias your results a lot. HTH, Marta GarcÃa-Granero ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Oliver, Richard
Folks,
I needed to automate a process of creating a new numeric variable, with appropriate Month-Year value labels, to represent 24 months of data where the time periods would change from time to time. This will work for any time period that has numeric variables for Year (yyyy) and Month (1-12), even if the months are not consecutive. Using SPSS 15 base only, the following does the trick. I'm sure this could be simplified or done other ways, including Python. Starting with the master data file, selected for the periods of interest: * COPY THE "MASTER" DATA FILE (SORTED ON YEAR, MONTH). * THE COPY WILL BE CONDENSED INTO A "LOOKUP" TABLE. DATASET COPY LOOKUP. DATASET ACTIVATE LOOKUP. * SELECT ONLY THE FIRST CASE FOR EACH YEAR-MONTH PAIR. MATCH FILES FILE = * /BY YEAR MONTH /FIRST = FYRMO. SELECT IF FYRMO. * CREATE A NUMERIC VARIABLE THAT WILL CONTAIN VALUES 1 TO N<100, * REFLECTING THE NUMBER OF YEAR-MONTH PAIRS. NUMERIC MONTHVAR (F2.0). COMPUTE MONTHVAR = $CASENUM. * ESTABLISH A FIRST AND LAST CASE FOR THE CONDENSED FILE. MATCH FILES FILE = * /BY MONTHVAR /LAST = LAST /FIRST = FIRST. * CREATE A STRING VARIABLE TO REPRESENT EVERY VALUE FOR THE 12-MONTH CYCLE. STRING MONTHVAL (A4). DO REPEAT P = 1 TO 12 /Q = 'Jan ' 'Feb ' 'Mar ' 'Apr ' 'May ' 'Jun ' 'Jul ' 'Aug ' 'Sep ' 'Oct ' 'Nov ' 'Dec '. IF MONTH = P MONTHVAL = Q. END REPEAT PRINT. * CREATE A STRING VARIABLE AS A PLACEHOLDER FOR THE * VALUE LABELS FOR EACH OF THE YEAR-MONTH PAIRS. * ADD THE TWO-DIGIT YEAR AFTER THE MONTH. STRING MONTHVARVAL (A6). COMPUTE MONTHVARVAL = CONCATENATE(MONTHVAL,STRING(YEAR-2000,N2)). * WRITE THE VALUE LABELS TO A SYNTAX FILE. * (MODIFIED FROM RAY LEVESQUE'S SPSSTOOLS.NET). DO IF FIRST. WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS" /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.". ELSE IF NOT LAST. WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS" /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.". ELSE. WRITE OUTFILE = "C:\TEMP\ADDVAL.SPS" /"ADD VALUE LABELS MONTHVAR " MONTHVAR " '" MONTHVARVAL "'.". END IF. * KEEP ONLY THE VARIABLES ESSENTIAL TO A LOOKUP TABLE. MATCH FILE FILE = * /KEEP YEAR MONTH MONTHVAR. * ACTIVATE THE PRE-SORTED MASTER FILE AND * MATCH IT TO THE LOOKUP TABLE. DATASET ACTIVATE MASTER. MATCH FILES FILE = * /TABLE = LOOKUP /BY YEAR MONTH. * INSERT THE ADD VALUE LABELS PROGRAM AND TEST THE RESULTS. INSERT FILE = "C:\TEMP\ADDVAL.SPS". MATCH FILES FILE = * /KEEP YEAR MONTH MONTHVAR. FREQ MONTHVAR. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of King Douglas Sent: Wednesday, January 23, 2008 11:31 AM To: [hidden email] Subject: Automate Add Value Labels using Python (or something)? Folks, I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files. For instance, say I have quarterly and monthly tracking survey results going back a number of years. The survey dates are represented in three numeric variables, Year, Month, Day. I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence. The "Month" variable already contains value labels (Jan, Feb...Dec). I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program. Now that I have theoretically solved this problem, I'll get to work. :>) If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated. King Douglas American Airlines Customer Research ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by King Douglas
At 12:31 PM 1/23/2008, King Douglas wrote:
>I'm working on a way to automate an Add Values statement for an >automatically generated numeric variable for month (1 to 24) that >represents a selected time period across a number of files. > >I think I could use Python to extract the labels, writing the >results to a syntax file that I can insert into my program. You may find native SPSS code is simpler for this. It's certainly handled "write and insert" logic well, for a long time. >For instance, say I have quarterly and monthly tracking survey >results going back a number of years. The survey dates are >represented in three numeric variables, Year, Month, Day. > >I can create the new variable values (1-24) with ordinary syntax, >but now want to automatically add value labels ("MoYr") based on the >24-month sequence. Since it looks like you're labeling with calendar month-year, would assigning date format "MOYR6" serve your turn? No, I think I see why not: your 'new variable' is the sequential month within the survey, running 1-24. (Right; that's what you responded to Richard Oliver.) >The "Month" variable already contains value labels (Jan, >Feb...Dec). I think I could use Python to extract the labels and >simply append the year, writing the results to a syntax file that I >can insert into my program. > >Now that I have theoretically solved this problem, I'll get to work. :>) >If anyone has any handy code or other suggestions while I'm working >at it, that would be much appreciated. Below is a native-SPSS solution. OK, here's to me, for thinking code-generating code could be 'simple'. Python has the advantage of direct access to the data dictionary, including the value labels. On the other hand, the Python solution may be somewhat complicated, because the mapping of survey month to calendar month is an attribute of the data, not the data dictionary. Anyhow, see how you like this. Test data, and code without interspersed output lines, follow the listing. |-----------------------------|---------------------------| |Output Created |23-JAN-2008 19:08:09 | |-----------------------------|---------------------------| [TestData] CaseID Date Year Month Day SvyMonth 001 12-MAY-2003 2003 5 12 1 002 21-JUN-2003 2003 6 21 2 003 30-JUN-2003 2003 6 30 2 004 16-JUL-2003 2003 7 16 3 005 14-SEP-2003 2003 9 14 5 006 29-SEP-2003 2003 9 29 5 007 03-OCT-2003 2003 10 3 6 008 08-OCT-2003 2003 10 8 6 009 03-NOV-2003 2003 11 3 7 010 10-NOV-2003 2003 11 10 7 011 13-DEC-2003 2003 12 13 8 012 17-DEC-2003 2003 12 17 8 013 23-DEC-2003 2003 12 23 8 014 23-JAN-2004 2004 1 23 9 015 01-FEB-2004 2004 2 1 10 016 01-FEB-2004 2004 2 1 10 017 05-FEB-2004 2004 2 5 10 018 11-FEB-2004 2004 2 11 10 019 17-FEB-2004 2004 2 17 10 020 23-FEB-2004 2004 2 23 10 021 02-APR-2004 2004 4 2 12 022 14-APR-2004 2004 4 14 12 023 14-APR-2004 2004 4 14 12 024 04-MAY-2004 2004 5 4 13 025 21-MAY-2004 2004 5 21 13 026 04-JUN-2004 2004 6 4 14 027 22-JUN-2004 2004 6 22 14 028 09-JUL-2004 2004 7 9 15 029 05-OCT-2004 2004 10 5 18 030 28-OCT-2004 2004 10 28 18 031 31-OCT-2004 2004 10 31 18 032 03-NOV-2004 2004 11 3 19 033 09-NOV-2004 2004 11 9 19 034 16-DEC-2004 2004 12 16 20 035 11-JAN-2005 2005 1 11 21 036 20-FEB-2005 2005 2 20 22 037 18-APR-2005 2005 4 18 24 038 29-APR-2005 2005 4 29 24 039 07-MAY-2005 2005 5 7 25 Number of cases read: 39 Number of cases listed: 39 * ................. Post after this point ..................... . * ................................................................. . DATASET ACTIVATE TestData WINDOW=FRONT /* Helpful, during testing */. * I. PREPARE A FILE from which to generate code ............ . * I.A Add a dummy record for every month in the survey, to make . * sure all months are represented. . NEW FILE. INPUT PROGRAM. . LOOP SvyMonth = 1 TO 24. . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. ADD FILES /FILE=TestData /FILE=*. * I.B Create a file with one record for every survey month. . * (It retains one survey date and one calendar-month value, . * so the value label can be built either using the existing . * label for 'Month', or the SPSS 'MONTHn' format.) . DATASET DECLARE CodeGen. AGGREGATE OUTFILE=CodeGen /BREAK=SvyMonth /FrstDate 'To have one date value in the result' = MIN(Date) /CalndrMn 'To have the old "Month" value in the result' = MIN(Month) /NResps 'Number of responses - I''m always curious' = NU. DATASET ACTIVATE CodeGen WINDOW=FRONT. * (The next line has nothing to do with the purpose of this code) . . COMPUTE NResps = NResps - 1 /* Discount the dummy record */. * Check an error that baffled me for a while. ...... . * For resolution, see I.C., below. ...... . . /*-- DISPLAY DICTIONARY /VARIABLES=CalndrMn /*-*/. . /*-- TEMPORARY /*-*/. . /*-- STRING OldLabel (A5) /*-*/. . /*-- COMPUTE OldLabel=VALUELABEL(CalndrMn) /*-*/. . /*-- LIST /*-*/. * I.C Fill in "date" and calendar month, for survey months with . * no observed responses. . NUMERIC NoBreak (F2). COMPUTE NoBreak = 1. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=NOBREAK /Ur_Month 'First survey month with any response' = MIN(SvyMonth) /Ur_Date 'The very earliest response date' = MIN(FrstDate). NUMERIC #Ur_Day1 (DATE11) /* First day of first response month */ #SvyDay1 (DATE11) /* First day of first month of survey */. DO IF MISSING(FrstDate). . COMPUTE #Ur_Day1 = DATE.MDY(XDATE.MONTH(Ur_Date), 1, XDATE.YEAR (Ur_Date)). * Filled-in date is the first day of the survey month ....... . . COMPUTE #SvyDay1 = DATESUM(#Ur_Day1,(1-Ur_Month),"Months"). . COMPUTE FrstDate = DATESUM(#Ur_Day1,(SvyMonth-1),"Months"). . COMPUTE CalndrMn = XDATE.MONTH(FrstDate). END IF. . /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |23-JAN-2008 19:08:12 | |-----------------------------|---------------------------| [CodeGen] SvyMonth FrstDate CalndrMn NResps NoBreak Ur_Month Ur_Date 1 12-MAY-2003 5 1 1 1 12-MAY-2003 2 21-JUN-2003 6 2 1 1 12-MAY-2003 3 16-JUL-2003 7 1 1 1 12-MAY-2003 4 01-AUG-2003 8 0 1 1 12-MAY-2003 5 14-SEP-2003 9 2 1 1 12-MAY-2003 6 03-OCT-2003 10 2 1 1 12-MAY-2003 7 03-NOV-2003 11 2 1 1 12-MAY-2003 8 13-DEC-2003 12 3 1 1 12-MAY-2003 9 23-JAN-2004 1 1 1 1 12-MAY-2003 10 01-FEB-2004 2 6 1 1 12-MAY-2003 11 01-MAR-2004 3 0 1 1 12-MAY-2003 12 02-APR-2004 4 3 1 1 12-MAY-2003 13 04-MAY-2004 5 2 1 1 12-MAY-2003 14 04-JUN-2004 6 2 1 1 12-MAY-2003 15 09-JUL-2004 7 1 1 1 12-MAY-2003 16 01-AUG-2004 8 0 1 1 12-MAY-2003 17 01-SEP-2004 9 0 1 1 12-MAY-2003 18 05-OCT-2004 10 3 1 1 12-MAY-2003 19 03-NOV-2004 11 2 1 1 12-MAY-2003 20 16-DEC-2004 12 1 1 1 12-MAY-2003 21 11-JAN-2005 1 1 1 1 12-MAY-2003 22 20-FEB-2005 2 1 1 1 12-MAY-2003 23 01-MAR-2005 3 0 1 1 12-MAY-2003 24 18-APR-2005 4 2 1 1 12-MAY-2003 25 07-MAY-2005 5 0 1 1 12-MAY-2003 Number of cases read: 25 Number of cases listed: 25 * II. GENERATE THE CODE, and write to the external file . ADD FILES /FILE = * /BY NoBreak /FIRST= ItBegins /LAST = ItEnds. STRING #Syntax (A60) /* To build the output line in */. DO IF ItBegins /* The command header line: */. . COMPUTE #Syntax = "ADD VALUE LABELS SvyMonth". . WRITE OUTFILE=Syntax /#Syntax. END IF. * .... /* The label lines: */. ** (In quest for clearer code, this uses more **. ** variables and lines than strictly necessary.) **. STRING #Label (A5) #LblMn (A3) #LblYr (A4). * Month name for label, using SPSS "MONTHn" format **. *xxCOMPUTE #LblMn= STRING(FrstDate,MONTH3). * Month name for label, using existing value labels**. + COMPUTE #LblMn= VALUELABEL(CalndrMn). * Two-digit year, for label **. COMPUTE #LblYr= STRING(XDATE.YEAR(FrstDate),N4). COMPUTE #LblYr= SUBSTR(#LblYr,3). * The complete label text: **. COMPUTE #Label= CONCAT(#LblMn,#LblYr). COMPUTE #Syntax = CONCAT(" " , STRING(SvyMonth,F3), " '" , #Label , "'" ). DO IF ItEnds /* Add period to the last line:*/. . COMPUTE #Syntax = CONCAT(RTRIM(#Syntax),"."). END IF. + WRITE OUTFILE=Syntax / #Syntax. EXECUTE /* An instance where EXECUTE is needed */. * III. TRACE AND TEST: List the generated code ............ . + /**/ DATA LIST FIXED /*-*/ /**/ FILE=Syntax / /*-*/ /**/ CodeLine 01-60(A) /*-*/. Data List will read 1 records from C:\Documents and Settings\...hon - SYNTAX.SAV Variable Rec Start End Format CodeLine 1 1 60 A60 + /**/ LIST /*-*/. List |-----------------------------|---------------------------| |Output Created |23-JAN-2008 19:08:14 | |-----------------------------|---------------------------| CodeLine ADD VALUE LABELS SvyMonth 1 'May03' 2 'Jun03' 3 'Jul03' 4 'Aug03' 5 'Sep03' 6 'Oct03' 7 'Nov03' 8 'Dec03' 9 'Jan04' 10 'Feb04' 11 'Mar04' 12 'Apr04' 13 'May04' 14 'Jun04' 15 'Jul04' 16 'Aug04' 17 'Sep04' 18 'Oct04' 19 'Nov04' 20 'Dec04' 21 'Jan05' 22 'Feb05' 23 'Mar05' 24 'Apr05' 25 'May05'. Number of cases read: 26 Number of cases listed: 26 * IV. INCLUDE AND EXECUTE the code ............ . DATASET ACTIVATE TestData. INSERT File=Syntax. 227 ADD VALUE LABELS SvyMonth 228 1 'May03' 229 2 'Jun03' 230 3 'Jul03' 231 4 'Aug03' 232 5 'Sep03' 233 6 'Oct03' 234 7 'Nov03' 235 8 'Dec03' 236 9 'Jan04' 237 10 'Feb04' 238 11 'Mar04' 239 12 'Apr04' 240 13 'May04' 241 14 'Jun04' 242 15 'Jul04' 243 16 'Aug04' 244 17 'Sep04' 245 18 'Oct04' 246 19 'Nov04' 247 20 'Dec04' 248 21 'Jan05' 249 22 'Feb05' 250 23 'Mar05' 251 24 'Apr05' 252 25 'May05'. 253 254 * End of INSERT and INCLUDE nesting level 01. DISPLAY DICTIONARY /VARIABLES=SvyMonth. File Information |-----------------------------|---------------------------| |Output Created |23-JAN-2008 19:08:14 | |-----------------------------|---------------------------| [TestData] Variable Information |--------|--------|------|------------|------|---------|------|------| |Variable|Position|Label |Measurement |Column|Alignment|Print |Write | | | | |Level |Width | |Format|Format| |--------|--------|------|------------|------|---------|------|------| |SvyMonth|6 |<none>|Scale |10 |Right |F3 |F3 | |--------|--------|------|------------|------|---------|------|------| Variables in the working file Variable Values |--------|--|-----| |Value | |Label| |--------|--|-----| |SvyMonth|1 |May03| | |2 |Jun03| | |3 |Jul03| | |4 |Aug03| | |5 |Sep03| | |6 |Oct03| | |7 |Nov03| | |8 |Dec03| | |9 |Jan04| | |10|Feb04| | |11|Mar04| | |12|Apr04| | |13|May04| | |14|Jun04| | |15|Jul04| | |16|Aug04| | |17|Sep04| | |18|Oct04| | |19|Nov04| | |20|Dec04| | |21|Jan05| | |22|Feb05| | |23|Mar05| | |24|Apr05| | |25|May05| |--------|--|-----| =================================== APPENDIX: Code, including test data =================================== * ................................................................. . * ................. Test data and file handle ................. . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 01232008 /* Date of posting, mmddyyyy */ . FILE HANDLE Syntax /NAME='C:\Documents and Settings\Richard\My Documents' + '\Temporary\SPSS\' + '2008-01-23 Douglas - Automate Add Value Labels using Python' + ' - ' + 'SYNTAX.SAV'. NEW FILE. INPUT PROGRAM. . NUMERIC CaseID (N3). . NUMERIC Date (DATE11) Year Month Day (F4). . VALUE LABELS Month 1 'Jan' 2 'Feb' 3 'Mar' 4 'Apr' 5 'May' 6 'Jun' 7 'Jul' 8 'Aug' 9 'Sep' 10 'Oct' 11 'Nov' 12 'Dec'. . LEAVE CaseID Date. . COMPUTE Date = DATE.MDY(05,12,2003). . LOOP CaseID = 1 TO 500 IF Date LE DATE.MDY(05,12,2005). . COMPUTE Year = XDATE.YEAR (Date). . COMPUTE Month = XDATE.MONTH(Date). . COMPUTE Day = XDATE.MDAY (Date). . END CASE. . COMPUTE Date = XDATE.DATE(Date+RV.EXP(1/TIME.DAYS(20))). . END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME TestData WINDOW=FRONT. * Ah! The 'month' variable you want to label is not calendar date, . * but month 01-24 in the survey period. . NUMERIC #SvyStrt (DATE11). COMPUTE #SvyStrt= DATE.MDY(05,01,2003). NUMERIC SvyMonth (F3). COMPUTE SvyMonth = 1 + DATEDIFF(Date,#SvyStrt,"Months"). . /**/ LIST /*-*/. * ................. Post after this point ..................... . * ................................................................. . DATASET ACTIVATE TestData WINDOW=FRONT /* Helpful, during testing */. * I. PREPARE A FILE from which to generate code ............ . * I.A Add a dummy record for every month in the survey, to make . * sure all months are represented. . NEW FILE. INPUT PROGRAM. . LOOP SvyMonth = 1 TO 24. . END CASE. . END LOOP. END FILE. END INPUT PROGRAM. ADD FILES /FILE=TestData /FILE=*. * I.B Create a file with one record for every survey month. . * (It retains one survey date and one calendar-month value, . * so the value label can be built either using the existing . * label for 'Month', or the SPSS 'MONTHn' format.) . DATASET DECLARE CodeGen. AGGREGATE OUTFILE=CodeGen /BREAK=SvyMonth /FrstDate 'To have one date value in the result' = MIN(Date) /CalndrMn 'To have the old "Month" value in the result' = MIN(Month) /NResps 'Number of responses - I''m always curious' = NU. DATASET ACTIVATE CodeGen WINDOW=FRONT. * (The next line has nothing to do with the purpose of this code) . . COMPUTE NResps = NResps - 1 /* Discount the dummy record */. * Check an error that baffled me for a while. ...... . * For resolution, see I.C., below. ...... . . /*-- DISPLAY DICTIONARY /VARIABLES=CalndrMn /*-*/. . /*-- TEMPORARY /*-*/. . /*-- STRING OldLabel (A5) /*-*/. . /*-- COMPUTE OldLabel=VALUELABEL(CalndrMn) /*-*/. . /*-- LIST /*-*/. * I.C Fill in "date" and calendar month, for survey months with . * no observed responses. . NUMERIC NoBreak (F2). COMPUTE NoBreak = 1. AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=NOBREAK /Ur_Month 'First survey month with any response' = MIN(SvyMonth) /Ur_Date 'The very earliest response date' = MIN(FrstDate). NUMERIC #Ur_Day1 (DATE11) /* First day of first response month */ #SvyDay1 (DATE11) /* First day of first month of survey */. DO IF MISSING(FrstDate). . COMPUTE #Ur_Day1 = DATE.MDY(XDATE.MONTH(Ur_Date), 1, XDATE.YEAR (Ur_Date)). * Filled-in date is the first day of the survey month ....... . . COMPUTE #SvyDay1 = DATESUM(#Ur_Day1,(1-Ur_Month),"Months"). . COMPUTE FrstDate = DATESUM(#Ur_Day1,(SvyMonth-1),"Months"). . COMPUTE CalndrMn = XDATE.MONTH(FrstDate). END IF. . /**/ LIST /*-*/. * II. GENERATE THE CODE, and write to the external file . ADD FILES /FILE = * /BY NoBreak /FIRST= ItBegins /LAST = ItEnds. STRING #Syntax (A60) /* To build the output line in */. DO IF ItBegins /* The command header line: */. . COMPUTE #Syntax = "ADD VALUE LABELS SvyMonth". . WRITE OUTFILE=Syntax /#Syntax. END IF. * .... /* The label lines: */. ** (In quest for clearer code, this uses more **. ** variables and lines than strictly necessary.) **. STRING #Label (A5) #LblMn (A3) #LblYr (A4). * Month name for label, using SPSS "MONTHn" format **. *xxCOMPUTE #LblMn= STRING(FrstDate,MONTH3). * Month name for label, using existing value labels**. + COMPUTE #LblMn= VALUELABEL(CalndrMn). * Two-digit year, for label **. COMPUTE #LblYr= STRING(XDATE.YEAR(FrstDate),N4). COMPUTE #LblYr= SUBSTR(#LblYr,3). * The complete label text: **. COMPUTE #Label= CONCAT(#LblMn,#LblYr). COMPUTE #Syntax = CONCAT(" " , STRING(SvyMonth,F3), " '" , #Label , "'" ). DO IF ItEnds /* Add period to the last line:*/. . COMPUTE #Syntax = CONCAT(RTRIM(#Syntax),"."). END IF. + WRITE OUTFILE=Syntax / #Syntax. EXECUTE /* An instance where EXECUTE is needed */. * III. TRACE AND TEST: List the generated code ............ . + /**/ DATA LIST FIXED /*-*/ /**/ FILE=Syntax / /*-*/ /**/ CodeLine 01-60(A) /*-*/. + /**/ LIST /*-*/. * IV. INCLUDE AND EXECUTE the code ............ . DATASET ACTIVATE TestData. INSERT File=Syntax. DISPLAY DICTIONARY /VARIABLES=SvyMonth. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by King Douglas
Shalom
Here is another approach witch is using the autorecode apply template feature . The program generate a SPSS-format data file (see the spss manual under Template File Format in the autorecode command). The manual also state that creating SPSS-format file is not tested . After creating the file with the spss format you can apply it to any variable using the autorecode command . As you can see the syntax is vary short . To test the program change the file name in the get command to any file that have date variable and rename that variable labdate . title "Automate Add Value Labels" . *** use file with dates to get first and last date ***** , get file='P:\Project\sprinkles\spss\lab.sav' /keep=tz labdate . compute dummi=1. AGGREGATE /OUTFILE=* /BREAK=dummi /mindate = MIN(labdate) /maxdate = MAX(labdate). compute last_my=datediff(maxdate,mindate,'month'). string strdate(a6) . loop i=1 to last_my . COMPUTE tmpdate =DATESUM(mindate, i, "months", "closest"). compute strdate=string(tmpdate,MOYR6) . **** save a template file **** . xsave outfile='c:\temp\var1_template.sat'/ rename( strdate i = Source_ Target_ ) / keep=Source_ Target_ . xsave outfile='c:\temp\dates1.sav'/ keep=strdate i . end loop. execute . get file='c:\temp\dates1.sav' . **** apply the template to a variable *** . autorecode strdate / into idate /applay TEMPLATE='c:\temp\var1_template.sat'. *** see the generated value for idate in the variable view window ***** . Hillel vardi BGU King Douglas wrote: > Folks, > > I'm working on a way to automate an Add Values statement for an automatically generated numeric variable for month (1 to 24) that represents a selected time period across a number of files. > > For instance, say I have quarterly and monthly tracking survey results going back a number of years. The survey dates are represented in three numeric variables, Year, Month, Day. > > I can create the new variable values (1-24) with ordinary syntax, but now want to automatically add value labels ("MoYr") based on the 24-month sequence. > > The "Month" variable already contains value labels (Jan, Feb...Dec). I think I could use Python to extract the labels and simply append the year, writing the results to a syntax file that I can insert into my program. > > Now that I have theoretically solved this problem, I'll get to work. :>) > If anyone has any handy code or other suggestions while I'm working at it, that would be much appreciated. > > King Douglas > American Airlines Customer Research > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
