Dear SPSSers,
First of all, I'd like to express my gratitude to all of those who've answered my question about how to do Poisson regression with SPSS. I would appreciate very much if someone could answer this one regarding the creation of age categories with a macro. I have a file which contains data about estimates and projections of Quebec province population from 1981 to 2026. The file has one variable for each age, MoinsUnan (under 1 year), p_1, p_2, p_3 .... p_100. What I'm looking for is a macro which could allow me to define any number of age categories. Let's say that I'd like to create a file with 20 variables, one for each 20 age categories first var = MoinsUnAn second var = p_1a4 third var = p_5a9 fourth var = p_10a14 ... 20th var = 90yearsandover Let's assume that I would like to create another file with 5 variables corresponding to 5 age categories first var =0a14 second var =15a29 third var =30a44 fourth var =45a64 fifth var =65andover. I've been struggling with this problem for more than a day now, but I can't see the solution. Thanks in advance to anyone who could give me some help. Yves Therriault, Ph. D. Canada I |
Yves,
I'm a bit confused by your question. At first, i thought you wanted to do a 'variable condensation' where you condensed the data in 10 variables, age1 to age10, into one new variable. But given what you are working with '...estimates and projections of Quebec province population from 1981 to 2026'; i think each variable must have data in it, i.e., age1 is the number of one year olds. etc. So now i think you just want to remane the variables such that the new names have a meaningful structure. If this is so, i'd do it with a Rename variables command. Perhaps you've already thought of this and believe a macro would give you more greater flexibility. If so, perhaps a macro expert can help. Gene Maguin |
In reply to this post by Yves_Therriault
On Fri, 28 Jul 2006 09:50:44 -0400, [hidden email] wrote:
>Yves, > >I'm a bit confused by your question. At first, i thought you wanted to do a >'variable condensation' where you condensed the data in 10 variables, age1 >to age10, into one new variable. But given what you are working with >'...estimates and projections of Quebec province population from 1981 to >2026'; i think each variable must have data in it, i.e., age1 is the number >of one year olds. etc. So now i think you just want to remane the variables >such that the new names have a meaningful structure. If this is so, i'd do >it with a Rename variables command. Perhaps you've already thought of this >and believe a macro would give you more greater flexibility. If so, perhaps >a macro expert can help. > Hi Gene and all SPSSers, I'm really sorry for not having posted a clear question. English isn't my native tongue so I'm not as fluent in English as I would like. First of all, let me explain why I need a macro to build age categories "on the fly". I've already written many sets of macros in order to compute some statistic like the age-adjusted death rates, age-adjusted YPLL rates, and so forth. Each set has 3 macros specially designed for my region. For example, in the case of age-adjusted death rates, the first macro has been written to compute the total number of deaths observed over a period of x years for each of the sub-territories of the North Shore region and the province of Quebec as a whole; the mean number of deaths per year, the annual crude death rate for each age category. The data related to death is taken from a file provided by the Quebec government. In order to compute the death rate, the macro calls a population file that gives the number of people in each age categories of the the North Shore population, it's sub- territories and the population of the Quebec province for the mid-period chosen in the analysis. The second macro has been written to compute the age-standardized death rates for each territory. Among other things, it calls the standard population (for instance : the 2001 Quebec population). The third macro is used to compute the confidence intervals of the age-standardized death rates and to see if there is any significant difference between the age- standardized death rate of Quebec province and those of the North Shore and it's sub-territories. I usually work with 20 age categories : under 1 year, 1 to 4, 5 to 9, 10 to 14 ... 90 and over. I've already created many SPSS files (one per year for each gender and the whole population). In those files, I've one variable for each age category : p0, p1à4, p5à9, p10à14... p90. The problem I've been struggling with for more than 2 days, is that I'm trying to build a macro that would allow the "creation" of different number of age categories as someone could wish to work with other age categories than those that I use. Then, instead of working with populations files that I mentionned above, I would call a file that contains variables about estimates and projections of Quebec province population from 1981 to 2026. As I wrote in my earlier post, this file has one variable per age : MoinsUnan, p_1, p_2, p_3 .... p_100. The variable p_100 is related to people who are 100 years old and over. Let's say that someone would like to work with 5 age categories : first cat=p0a14 second cat=p15a29 third cat=p30a44 fourth cat=p45a64 fifth cat=p65_ans_et_plus. When he(she) calls the macro, he(she) would have to specify 5 arguments regarding the age : !agecat première_cat = 0à14 / seconde_cat = 15à29 / troisième_cat = 30à44 / quatrième_cat = 45à64 / cinquième_cat = 65ans_et_plus. Hence, for the argument 0à14, the macro would create a variable called p0a14 which would be the sum of the variables "MoinsUnan to p_14", for the argument 15à29, the macro would sum the variables p_15 to p_29 and create the variable "p15a29" ... And for the fifth argument, 65ans_et_plus, the macro would create a variable "p65_ans_et_plus" for the people who are 65 years old and over. Should someone want to use 20 age categories, he would have to use 20 arguments like : !agecat première_cat = 0 / seconde_cat = 1à4 / troisième_cat = 5à9 / .. 20ième_cat = 90_et_plus. In that case, the macro would create 20 variables, one for each age category : p0, p1a4, p5a9 .. p90. I hope that I've succeed to clarify my earlier post. I would appreciate if someone could show me the light :-). P.S. Sorry for the bad English. Yves Therriault, Ph.D. Agent de recherche, Direction de la santé publique Agence de la santé et des services sociaux de la Côte-Nord |
All,
I must be doing something wrong or misunderstanding the documetation but ... This is the frequencies for the variable going in to the recode (apologies if tabs are not preserved). P1RAAC1 A1 Par: Alcohol Abuse Count Frequency Percent Valid Percent Cumulative Percent Valid .00 89 13.2 13.6 13.6 1.00 102 15.1 15.6 29.2 2.00 58 8.6 8.9 38.1 ... 12.00 24 3.6 3.7 98.3 13.00 11 1.6 1.7 100.0 Total 654 97.0 100.0 Missing 99.00 19 2.8 System 1 .1 Total 20 3.0 Total 674 100.0 This the recode statement. RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9). And this is the result P1RAAC1 A1 Par: Alcohol Abuse Count Frequency Percent Valid Percent Cumulative Percent Valid .00 0-1 NonAlc 191 28.3 28.4 28.4 2.00 2 Maybe 58 8.6 8.6 37.0 3.00 3+ Definite 405 60.1 60.2 97.2 99.00 19 2.8 2.8 100.0 Total 673 99.9 100.0 Missing 9.00 1 .1 Total 674 100.0 As I read the documentation for 13, the 'missing' keyword causes user missing and system missing value to be coded to sysmis on the output. It doesn't look like this is happening here. Comments? Thanks, Gene Maguin |
In reply to this post by Yves_Therriault
All,
My conclusion sentences were in error. I said: As I read the documentation for 13, the 'missing' keyword causes user missing and system missing value to be coded to sysmis on the output. It doesn't look like this is happening here. Comments? I should have said: As I read the documentation for 13, the 'missing' keyword causes both user missing and system missing values to be coded to the specified value on the output. It doesn't look like the user missing value of 99 was recoded to 9 but should have been. Comments? Gene Maguin |
In reply to this post by Maguin, Eugene
The documentation is unfortunately vague. What the MISSING keyword actually does is recode any user- and system-missing values into the specified value. It does not make that value user-missing (or system-missing). If you want user-missing values to be system-missing, use MISSING=SYSMIS.
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin Sent: Monday, July 31, 2006 12:46 PM To: [hidden email] Subject: Recode question All, I must be doing something wrong or misunderstanding the documetation but ... This is the frequencies for the variable going in to the recode (apologies if tabs are not preserved). P1RAAC1 A1 Par: Alcohol Abuse Count Frequency Percent Valid Percent Cumulative Percent Valid .00 89 13.2 13.6 13.6 1.00 102 15.1 15.6 29.2 2.00 58 8.6 8.9 38.1 ... 12.00 24 3.6 3.7 98.3 13.00 11 1.6 1.7 100.0 Total 654 97.0 100.0 Missing 99.00 19 2.8 System 1 .1 Total 20 3.0 Total 674 100.0 This the recode statement. RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9). And this is the result P1RAAC1 A1 Par: Alcohol Abuse Count Frequency Percent Valid Percent Cumulative Percent Valid .00 0-1 NonAlc 191 28.3 28.4 28.4 2.00 2 Maybe 58 8.6 8.6 37.0 3.00 3+ Definite 405 60.1 60.2 97.2 99.00 19 2.8 2.8 100.0 Total 673 99.9 100.0 Missing 9.00 1 .1 Total 674 100.0 As I read the documentation for 13, the 'missing' keyword causes user missing and system missing value to be coded to sysmis on the output. It doesn't look like this is happening here. Comments? Thanks, Gene Maguin |
In reply to this post by Yves_Therriault
All,
I think something has been missed the replies to my question. Going into the recode 99 was declared to be user missing as the frequency listing shows. The recode statement ... (missing=9) should have recoded any user or system missing values to 9. When you look at what comes out, you see that one case has a value of 9, which would be the case with a sysmis value going in, and 19 cases have a value of 99. The sysmis part worked as documented. My contention is that those 19 cases, because they were user missing going in should have been changed to 9 coming out but they weren't. It is these 19 cases that I interested in. I also realize I also did a poor job in making clear what I did because I left out a missing values statement wherein I declared that 9 was user missing and which was positioned after the recode statement and before the frequencies statement. Gene Maguin |
Quoting the CSR
Value specifications are scanned left to right. A value is recoded only once per RECODE command So the first specification that matches determines the recode. Overlaps are ok, and, in fact, if you have interval recodes, you control how the endpoints are treated by the order. HTH Jon Peck SPSS -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin Sent: Montag, 31. Juli 2006 14:01 To: [hidden email] Subject: Re: [SPSSX-L] Recode question All, I think something has been missed the replies to my question. Going into the recode 99 was declared to be user missing as the frequency listing shows. The recode statement ... (missing=9) should have recoded any user or system missing values to 9. When you look at what comes out, you see that one case has a value of 9, which would be the case with a sysmis value going in, and 19 cases have a value of 99. The sysmis part worked as documented. My contention is that those 19 cases, because they were user missing going in should have been changed to 9 coming out but they weren't. It is these 19 cases that I interested in. I also realize I also did a poor job in making clear what I did because I left out a missing values statement wherein I declared that 9 was user missing and which was positioned after the recode statement and before the frequencies statement. Gene Maguin |
In reply to this post by Maguin, Eugene
I think the problem is the sometimes confusing behavior of MISSING VALUES, which always takes effect immediately, whereas RECODE doesn't execute until the next command that reads the data. This is one of those rare instances where you may need an EXECUTE command.
If your code is currently something like: missing values somevar (99). recode somevar (value=value)...(missing=9). missing values somevar (9). frequencies variables=somevar. That is really no different than: missing values somevar (9, 99). recode... frequencies... because the MISSING VALUES command that sets 9 to missing takes effect before the preceding RECODE is executed, and I think the RECODE specification of MISSING=9 may be problematic if 9 is already considered user-missing. Try putting an EXECUTE between the RECODE and the second MISSING VALUES command, as in: missing values somevar (99). recode somevar (value=value)...(missing=9). execute. missing values somevar (9). frequencies variables=somevar. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of emaguin Sent: Monday, July 31, 2006 2:01 PM To: [hidden email] Subject: Re: Recode question All, I think something has been missed the replies to my question. Going into the recode 99 was declared to be user missing as the frequency listing shows. The recode statement ... (missing=9) should have recoded any user or system missing values to 9. When you look at what comes out, you see that one case has a value of 9, which would be the case with a sysmis value going in, and 19 cases have a value of 99. The sysmis part worked as documented. My contention is that those 19 cases, because they were user missing going in should have been changed to 9 coming out but they weren't. It is these 19 cases that I interested in. I also realize I also did a poor job in making clear what I did because I left out a missing values statement wherein I declared that 9 was user missing and which was positioned after the recode statement and before the frequencies statement. Gene Maguin |
In reply to this post by Peck, Jon
Greetings:
I have a new set of data for a repeated measures study with three treatment groups. We have a number of demographic factors that we would like to test for equal frequency distribution among the three groups. I have created tables with columns for the treatment groups and rows for the demographic factor. I then include the chi-square for independence to test for equal distribution. However, for some of these (for example, occupation which includes a number of categories in which less than 5 respondents were represented), the chi-square test for independence cannot be used because the data do not meet the minimum expected cell frequency of 5 assumption. Is it acceptable to use the "compare column proportion" using z-scores, for these comparisons, which is also included with Tables? I know these are a series of pairwise comparisons for each pair of treatments, but they are Bonferroni-protected and appear to give me the answers that I need. I just am not certain that this is a correct approach as I typically would use Chi-square for this type of testing. Thanks for your help! Linda Case Linda P. Case AutumnGold Consulting www.autumngoldconsulting.com (217) 586-4864 [hidden email] |
Why not use Fisher's exact test? That way you won't have to worry about
having cells with few respondents. David Greenberg, Sociology Department, New York University ----- Original Message ----- From: Linda Case <[hidden email]> Date: Monday, July 31, 2006 5:15 pm Subject: Custom Tables Question: Use of Chi-Square for Independence vs. Z-Test for Equality of Column Proportions > Greetings: > > I have a new set of data for a repeated measures study with three > treatmentgroups. We have a number of demographic factors that we > would like to test > for equal frequency distribution among the three groups. I have > createdtables with columns for the treatment groups and rows for > the demographic > factor. I then include the chi-square for independence to test for > equaldistribution. However, for some of these (for example, > occupation which > includes a number of categories in which less than 5 respondents were > represented), the chi-square test for independence cannot be used > becausethe data do not meet the minimum expected cell frequency of > 5 assumption. > > Is it acceptable to use the "compare column proportion" using z- > scores, for > these comparisons, which is also included with Tables? I know > these are a > series of pairwise comparisons for each pair of treatments, but > they are > Bonferroni-protected and appear to give me the answers that I need. > I just > am not certain that this is a correct approach as I typically would > useChi-square for this type of testing. > > Thanks for your help! > > Linda Case > > Linda P. Case > AutumnGold Consulting > www.autumngoldconsulting.com > (217) 586-4864 > [hidden email] > |
In reply to this post by Maguin, Eugene
There have been several answers to this, but I don't think any of them
have quite hit the point. Apologies if it's solved, and I missed the solution. At 03:45 PM 7/31/2006, emaguin wrote: >This is the frequencies for the variable going in to the recode. > >P1RAAC1 A1 Par: Alcohol Abuse Count > Frequency Percent Valid Cumulative > Percent Percent >Valid .00 89 13.2 13.6 13.6 > 1.00 102 15.1 15.6 29.2 > 2.00 58 8.6 8.9 38.1 > ... > 12.00 24 3.6 3.7 98.3 > 13.00 11 1.6 1.7 100.0 > Total 654 97.0 100.0 >Missing 99.00 19 2.8 > System 1 .1 > Total 20 3.0 >Total 674 100.0 > >This the recode statement. > >RECODE P1RAAC1(0 1=0)(2=2)(3 THRU 13=3)(MISSING=9). > >And this is the result > >P1RAAC1 A1 Par: Alcohol Abuse Count > Frequency Percent Valid Cumulative > Percent Percent >Valid .00 0-1 NonAlc 191 28.3 28.4 28.4 > 2.00 2 Maybe 58 8.6 8.6 37.0 > 3.00 3+ Definite 405 60.1 60.2 97.2 > 99.00 19 2.8 2.8 100.0 > Total 673 99.9 100.0 >Missing 9.00 1 .1 >Total 674 100.0 As from Richard Oliver's second posting, it looks like it's a problem with a MISSING VALUES statement. The first FREQUENCIES are what you'd get if 99 is a user-missing value. The second are what you'd get if, for the RECODE and FREQUENCIES, 9 is a user-missing value but 99 is not. You probably used a statement . MISSING VALUES P1RAAC1(9). Replace it with . MISSING VALUES P1RAAC1(9,99). and you should get what you want. Be cautious: RECODEing a source variable into itself is usually a bad idea. In this case, it loses information: you can't tell 0 from 1 anymore, or 2 through 13 apart from each other. Much better to RECODE INTO a new variable, like P1RAAC1A. |
V. simple question
What is best and simplest method for caomparing several correlation matrices model free Not large 5 variables, minimum 30 obesrvations per matrix Any help appreciated Best Diana Professor Diana Kornbrot Evaluation Co-ordinator, Blended Learning Unit University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK Blended Learning Unit voice +44 (0) 170 728 1315 fax +44 (0) 170 728 1320 Psychology voice +44 (0) 170 728 4626 fax +44 (0) 170 728 5073 email: [hidden email] http://www.psy.herts.ac.uk/pub/D.E.Kornbrot/hmpage.html |
Free forum by Nabble | Edit this page |