Administrator
|
Well said, Thara. I concur. Although is style may sometimes be a bit robust, one can learn a lot from David's posts. I know I have.
Cheers, Bruce p.s. - I am writing from a right-side-up position. ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
Thanks Thara and Bruce.
I suppose my clue stick could use a bit more padding, especially for folks with a thin skin. ;-) --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jack Noone
At 11:04 PM 11/13/2012, Jack Noone wrote:
>You may remember the thread below. The syntax you wrote was perfect, >however I need to keep some other variables as well and I can't seem >to figure out how to do it. Here is the piece of syntax in question. > >GET FILE="/Users/jacknoone/desktop/expanded file.sav". >AGGREGATE > /OUTFILE=* > /BREAK =P_ID year > /JinYear 'Number of jobs held in calendar year'=NU > /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES). > >And at 05:58 PM 11/15/2012, Jack Noone wrote: >Here is what the data looks like prior to AGG BREAK. > >P_ID year job yr_start yr_stop job_SES self_employed >1 1964 1 1964 1965 48.4 1 >1 1965 1 1964 1965 48.4 1 >1 1965 2 1965 1967 48.4 0 >1 1965 2 1965 1967 48.4 0 >1 1965 2 1965 1967 48.4 0 >1 1967 3 1967 1969 48.4 1 >1 1967 3 1967 1969 48.4 1 >1 1968 4 1968 1969 48.4 0 >1 1969 4 1968 1969 48.4 0 >1 1969 5 1969 1974 83.7 1 >1 1969 5 1969 1974 83.7 1 I thought this might come up: you want not just the highest SES for the year, but some other values from the job for which the SES was highest. That's a little less elegant, and I'm not testing this code, but see if this works: loop year = year_start TO year_end. . xsave outfile = "/Users/jacknoone/desktop/expanded file.sav" / keep = P_ID year Job_number Job_SES self_employed. end loop. execute. This is the same as the previous XSAVE loop, *except* adding 'self-employed' to the KEEP list. Add any other variables you want to keep, as well. Now for the AGGREGATE, and here's where it gets a little less elegant. Note the SORT CASES, and the use of FIRST instead of MAX for Job_SES (and 'self_employed'). GET FILE="/Users/jacknoone/desktop/expanded file.sav". SORT CASES BY P_ID (A) year (A) Job_SES (D). AGGREGATE /OUTFILE=* /BREAK =P_ID year /JinYear 'Number of jobs held in calendar year' =NU /Job_SES 'Highest job SES in calendar year' =FIRST(Job_SES) /self_employed 'Self-employed in that job?' =FIRST(self_employed). Contrary to one piece of advice, you definitely do not want MODE=ADDVARIABLES. That mode gives you one output record for each input record; you want to summarize to one output record per year of employment. And, again, I'm afraid this is not tested. -Best wishes, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
At 12:36 PM 11/16/2012, David Marso wrote:
>Thanks Thara and Bruce. >I suppose my clue stick could use a bit more padding, especially for folks >with a thin skin. Hard to see what harm it could do; especially, since being in the position of asking for help, often enough under time or boss or other pressure, often thins the skin a bit. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Richard Ristow
Hi Richard,
The syntax worked perfectly! Thanks so much for that - hugely appreciated. I wonder if you would be able to help me with the other problem I was having with missing variables? Firstly, I converted the Long file to wide as so: SORT CASES BY P_ID . CASESTOVARS /ID=P_ID /GROUPBY=INDEX. And after sorting the variables by name I end up with something like P_ID Job_SES.1 job_SES.2 job_ses.3 job_ses.4 Job_ses.55 1 45.6 . . 50.7 64.4 2 34.2 34.2 34.2 60.4 60.4 For participant 1 (and any other of course), I need to substitute the missing values in job_SES.2 and job_ses.3 for the SES value in their last job. In other words, if the person is out of work for a year, then I substitute in the SES value from their previous job. David suggested the following syntax, which "appears" to run (I.e. The SPSS output says it has worked). However, the missing data has not been filled in the dataset. I've tried this on another computer as well with the same result. VECTOR V=job_ses.1 TO job_ses.55. LOOP #=2 TO 10. IF MISSING(V(#)) V(#)=V(#-1). END LOOP. Execute. Any ideas? Any alternative? Please note that this is part of a separate research question and doesn't require data on self-employment etc. Cheers, Jack Dr. Jack Noone Research Fellow & LHH/ABBA Project Manager Ageing, Work and Health Research Unit Faculty of Health Sciences University of Sydney Ph: 02 9351 9411 On 19/11/12 10:21 AM, "Richard Ristow" <[hidden email]> wrote: >At 11:04 PM 11/13/2012, Jack Noone wrote: > >>You may remember the thread below. The syntax you wrote was perfect, >>however I need to keep some other variables as well and I can't seem >>to figure out how to do it. Here is the piece of syntax in question. >> >>GET FILE="/Users/jacknoone/desktop/expanded file.sav". >>AGGREGATE >> /OUTFILE=* >> /BREAK =P_ID year >> /JinYear 'Number of jobs held in calendar year'=NU >> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES). >> >>And at 05:58 PM 11/15/2012, Jack Noone wrote: >>Here is what the data looks like prior to AGG BREAK. >> >>P_ID year job yr_start yr_stop job_SES self_employed >>1 1964 1 1964 1965 48.4 1 >>1 1965 1 1964 1965 48.4 1 >>1 1965 2 1965 1967 48.4 0 >>1 1965 2 1965 1967 48.4 0 >>1 1965 2 1965 1967 48.4 0 >>1 1967 3 1967 1969 48.4 1 >>1 1967 3 1967 1969 48.4 1 >>1 1968 4 1968 1969 48.4 0 >>1 1969 4 1968 1969 48.4 0 >>1 1969 5 1969 1974 83.7 1 >>1 1969 5 1969 1974 83.7 1 > >I thought this might come up: you want not just the highest SES for >the year, but some other values from the job for which the SES was >highest. That's a little less elegant, and I'm not testing this code, >but see if this works: > >loop year = year_start TO year_end. >. xsave > outfile = "/Users/jacknoone/desktop/expanded file.sav" / > keep = P_ID year Job_number Job_SES self_employed. >end loop. >execute. > >This is the same as the previous XSAVE loop, *except* adding >'self-employed' to the KEEP list. Add any other variables you want to >keep, as well. Now for the AGGREGATE, and here's where it gets a >little less elegant. Note the SORT CASES, and the use of FIRST >instead of MAX for Job_SES (and 'self_employed'). > >GET FILE="/Users/jacknoone/desktop/expanded file.sav". >SORT CASES BY P_ID (A) year (A) Job_SES (D). >AGGREGATE > /OUTFILE=* > /BREAK =P_ID year > /JinYear 'Number of jobs held in calendar year' > =NU > /Job_SES 'Highest job SES in calendar year' > =FIRST(Job_SES) > /self_employed 'Self-employed in that job?' > =FIRST(self_employed). > >Contrary to one piece of advice, you definitely do not want >MODE=ADDVARIABLES. That mode gives you one output record for each >input record; you want to summarize to one output record per year of >employment. > >And, again, I'm afraid this is not tested. > >-Best wishes, > Richard > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
What did I previously say about blindly running donated code?
------ David suggested the following syntax, which "appears" to run (I.e. The SPSS output says it has worked). However, the missing data has not been filled in the dataset. I've tried this on another computer as well with the same result. VECTOR V=job_ses.1 TO job_ses.*55*. LOOP #=2 TO *10*. IF MISSING(V(#)) V(#)=V(#-1). END LOOP. Execute.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
*Hangs head in shame*
Fond regards :) Jack Dr. Jack Noone Research Fellow & LHH/ABBA Project Manager Ageing, Work and Health Research Unit Faculty of Health Sciences University of Sydney Ph: 02 9351 9411 On 19/11/12 6:21 PM, "David Marso" <[hidden email]> wrote: >What did I previously say about blindly running donated code? >------ >David suggested the following syntax, which "appears" to run (I.e. The >SPSS output says it has worked). However, the missing data has not been >filled in the dataset. I've tried this on another computer as well with >the same result. > >VECTOR V=job_ses.1 TO job_ses.**55**. >LOOP #=2 TO **10**. >IF MISSING(V(#)) V(#)=V(#-1). >END LOOP. >Execute. > > > >----- >Please reply to the list and not to my personal email. >Those desiring my consulting or training services please feel free to >email me. >-- >View this message in context: >http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-vari >ables-tp5715562p5716300.html >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Richard Ristow
Hello again - Ive struck *another* problem I hope you can help with.
I've only now realized that people started their first job at different times ranging from 1954 to 1985. So creating a SES indicator for each year of their working life won't allow me to to create an equal number of time points across particpants (for instance one person may have been in work for for 20 years while another has been working for 40). To solve this problem, I would like to create a list of variables to reflect average SES over particular points in time: P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000 SES_2001-2011 If a participant has has not entered the workforce by 1970, I would substitute in a SES score according to their education (happy to share the ref if anyone is interested). I'm really not sure about the best way to do this. Just as a reminder, my Long data file had the following variables (AUSEI06_2 = job_ses): P_ID year jinyear AUSEI06_2 self_employ Any ideas? I could cope if we couldn't include the self_employ variable. Thanks, Jack Dr. Jack Noone Research Fellow & LHH/ABBA Project Manager Ageing, Work and Health Research Unit Faculty of Health Sciences University of Sydney Ph: 02 9351 9411 On 19/11/12 10:21 AM, "Richard Ristow" <[hidden email]> wrote: >At 11:04 PM 11/13/2012, Jack Noone wrote: > >>You may remember the thread below. The syntax you wrote was perfect, >>however I need to keep some other variables as well and I can't seem >>to figure out how to do it. Here is the piece of syntax in question. >> >>GET FILE="/Users/jacknoone/desktop/expanded file.sav". >>AGGREGATE >> /OUTFILE=* >> /BREAK =P_ID year >> /JinYear 'Number of jobs held in calendar year'=NU >> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES). >> >>And at 05:58 PM 11/15/2012, Jack Noone wrote: >>Here is what the data looks like prior to AGG BREAK. >> >>P_ID year job yr_start yr_stop job_SES self_employed >>1 1964 1 1964 1965 48.4 1 >>1 1965 1 1964 1965 48.4 1 >>1 1965 2 1965 1967 48.4 0 >>1 1965 2 1965 1967 48.4 0 >>1 1965 2 1965 1967 48.4 0 >>1 1967 3 1967 1969 48.4 1 >>1 1967 3 1967 1969 48.4 1 >>1 1968 4 1968 1969 48.4 0 >>1 1969 4 1968 1969 48.4 0 >>1 1969 5 1969 1974 83.7 1 >>1 1969 5 1969 1974 83.7 1 > >I thought this might come up: you want not just the highest SES for >the year, but some other values from the job for which the SES was >highest. That's a little less elegant, and I'm not testing this code, >but see if this works: > >loop year = year_start TO year_end. >. xsave > outfile = "/Users/jacknoone/desktop/expanded file.sav" / > keep = P_ID year Job_number Job_SES self_employed. >end loop. >execute. > >This is the same as the previous XSAVE loop, *except* adding >'self-employed' to the KEEP list. Add any other variables you want to >keep, as well. Now for the AGGREGATE, and here's where it gets a >little less elegant. Note the SORT CASES, and the use of FIRST >instead of MAX for Job_SES (and 'self_employed'). > >GET FILE="/Users/jacknoone/desktop/expanded file.sav". >SORT CASES BY P_ID (A) year (A) Job_SES (D). >AGGREGATE > /OUTFILE=* > /BREAK =P_ID year > /JinYear 'Number of jobs held in calendar year' > =NU > /Job_SES 'Highest job SES in calendar year' > =FIRST(Job_SES) > /self_employed 'Self-employed in that job?' > =FIRST(self_employed). > >Contrary to one piece of advice, you definitely do not want >MODE=ADDVARIABLES. That mode gives you one output record for each >input record; you want to summarize to one output record per year of >employment. > >And, again, I'm afraid this is not tested. > >-Best wishes, > Richard > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
New Mexico state offices are closed until Monday November 26. |
In reply to this post by Jack Noone
At 08:05 PM 11/21/2012, Jack Noone asked about creating a 'wide' data
file in which there's a record for each subject, with a summary variable for each calendar decade (see below). I'm not comfortable answering that; this is your study, not mine, but I'm wondering if the results of doing so would be meaningful. You wrote: >People [in my study] started their first job at different times >ranging from 1954 to 1985. So creating a SES indicator for each year >of their working life won't allow me to create an equal number of >time points across participants (for instance one person may have >been in work for 20 years while another has been working for 40). This raises questions, not about using SPSS, but about your study and how you view it. You haven't said what questions you're trying to answer; that's OK, when you start with a specific data-handling question, but at this point, it will help us if you do say what you're looking for. Your data structure (by which I mean, the underlying organization, not how it happens to be structured in files) is a set of subjects, individuals, for each of whom you have information over a time series -- years of employment. The lengths of the time series vary among your subjects. Further, there are at least two meaningful ways to index the years within an employment history: by calendar year, and by year within each subject's employment history. Since you're putting your data in a 'wide' organization, I'll guess that you're going to do comparisons across subjects. In those comparisons, indexing values by calendar year gives very different comparisons than does indexing by employment year (or, by the way, by year of age). Which indexing to use depends critically on what questions you want to answer. You wrote, >Creating a SES indicator for each year of their working life won't >allow me to create an equal number of time points across participants". No, it won't; and this is inherent in your data, not something that can be solved in SPSS. You said, >To solve this problem, I would like to create a list of variables to >reflect average SES over particular points in time: > >P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000 SES_2001-2011 It wouldn't be too hard to create these variables that summarize by decade, though you'd have to say what SES value to use for the decade: The highest? The mean of all values that occur in the decade? But I don't see that even this would give you time series all of the same length; for participants who hadn't worked before 1970, or hadn't after 2001, you'd have some missing values. >If a participant has has not entered the workforce by 1970, I would >substitute in a SES score according to their education (happy to >share the ref if anyone is interested). I'm really not sure about >the best way to do this. It sounds like this is an accepted practice in your field. Fine, if it is; from an outside perspective, it sounds doubtful -- are SES from education and from employment level really the same thing? And, is it really important that all the subjects have time series of the same length? Many analytic methods, for example mixed linear models, don't require that. And when your time series inherently differ in length, and starting and ending period, making them all the same length with a data transformation seems a doubtful idea to me -- admittedly, not in your field; indeed, not even knowing what your field is. Forgive my going afield, but for the reasons I've given, I'm not comfortable answering your last question simply on its own terms. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi Richard, hi all.
These are all good points you raise and I've responded to them below. On 27/11/12 11:26 AM, "Richard Ristow" <[hidden email]> wrote: >At 08:05 PM 11/21/2012, Jack Noone asked about creating a 'wide' data >file in which there's a record for each subject, with a summary >variable for each calendar decade (see below). I'm not comfortable >answering that; this is your study, not mine, but I'm wondering if >the results of doing so would be meaningful. > >You wrote: > >>People [in my study] started their first job at different times >>ranging from 1954 to 1985. So creating a SES indicator for each year >>of their working life won't allow me to create an equal number of >>time points across participants (for instance one person may have >>been in work for 20 years while another has been working for 40). > >This raises questions, not about using SPSS, but about your study and >how you view it. You haven't said what questions you're trying to >answer; that's OK, when you start with a specific data-handling >question, but at this point, it will help us if you do say what >you're looking for. Certainly, although I am not at liberty to discuss the project in too much detail. We are examining the relationship between socioeconomic status and health across the life course. The aim is to create a socioeconomic trajectory starting from early childhood to late-middle age for each participant and then compare it with their health trajectory over the same period. > >Your data structure (by which I mean, the underlying organization, >not how it happens to be structured in files) is a set of subjects, >individuals, for each of whom you have information over a time series >-- years of employment. The lengths of the time series vary among >your subjects. Correct. To measure the participants' SES in early childhood, we have the choice of using father's occupation, Mother's occupation, and/or a number of other proxies. > >Further, there are at least two meaningful ways to index the years >within an employment history: by calendar year, and by year within >each subject's employment history. Since you're putting your data in >a 'wide' organization, I'll guess that you're going to do comparisons >across subjects. Across and within - probably using Latent Class Growth Analysis >In those comparisons, indexing values by calendar >year gives very different comparisons than does indexing by >employment year (or, by the way, by year of age). Which indexing to >use depends critically on what questions you want to answer. Very good point. As we are looking at SES across the life course, creating an SES index according to age seems like the best option to me (not according to decade as I state below - that was not a good idea). The participants' ages are similar enough (60-64) to control for age effects. > >You wrote, > >>Creating a SES indicator for each year of their working life won't >>allow me to create an equal number of time points across participants". > >No, it won't; and this is inherent in your data, not something that >can be solved in SPSS. Completely understood. >You said, > >>To solve this problem, I would like to create a list of variables to >>reflect average SES over particular points in time: >> >>P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000 >>SES_2001-2011 As noted above, It would make more sense to index by age (e.g. SES_16_25, SES_26-35). > >It wouldn't be too hard to create these variables that summarize by >decade, though you'd have to say what SES value to use for the >decade: The highest? The mean of all values that occur in the decade? The mean of all values that occur in a decade, but I am open to suggestions. > >But I don't see that even this would give you time series all of the >same length; for participants who hadn't worked before 1970, or >hadn't after 2001, you'd have some missing values. True, > >>If a participant has has not entered the workforce by 1970, I would >>substitute in a SES score according to their education (happy to >>share the ref if anyone is interested). I'm really not sure about >>the best way to do this. > >It sounds like this is an accepted practice in your field. Fine, if >it is; from an outside perspective, it sounds doubtful -- are SES >from education and from employment level really the same thing? The most accepted way of measuring SES depends on how old the person is. To measure childhood SES we usually look at the parents' occupation or education. To measure SES in the late teens and early 20s, education is probably a good option. Through adulthood, one may look at personal income, household income, and occupation etc. Of course, they are all just proxies for SES with neither one capturing the complexities of the contract on it's own. So it makes sense to me to use education as a proxy for SES at age 16-25, if they have not yet entered the workforce (or if they entered the workforce at say, age 20). Another alternative would be to use the participants' partner's occupation as a proxy for their own SES (assuming they have a partner of course), but I don't have access to this information. > >And, is it really important that all the subjects have time series of >the same length? Many analytic methods, for example mixed linear >models, don't require that. And when your time series inherently >differ in length, and starting and ending period, making them all the >same length with a data transformation seems a doubtful idea to me -- >admittedly, not in your field; indeed, not even knowing what your field >is. I think if the SES trajectory was based on aged, the trajectories will all be the same length. Well, give or take four years seeing as our participants' ages range from 60-64 > >Forgive my going afield, but for the reasons I've given, I'm not >comfortable answering your last question simply on its own terms. Not a worry. I appreciate your help so much and I understand why you needed more information. I'm also happy to provide more information as well! Regards, Jack > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
At 03:15 AM 12/1/2012, Jack Noone wrote:
>We are examining the relationship between socioeconomic status and >health across the life course. The aim is to create a socioeconomic >trajectory starting from early childhood to late-middle age for each >participant and then compare it with their health trajectory over >the same period. Thank you! This is enough detail to go a good ways toward addressing your problem. I'd written, >>Indexing values by calendar year gives very different comparisons >>than does indexing by employment year (or, by the way, by year of >>age). Which indexing to use depends critically on what questions >>you want to answer. and you make clear that you eventually want the values indexed by subject's calendar age. The code I posted previously gives you the occupation-based SES for each subject for each calendar year covered by one of their employment records. Given how you receive your data, it's best and easiest to start with this step. That leaves, however, 1. Converting calendar year to subject's age 2. Filling in gaps for years not covered by any employment record for the subject and (new problem) 3. Filling in parent-based or education-based SES for years before the commencement of employment. The first is easy if you have all subjects' birth years: use MATCH FILES to add the year of birth to each record, then subtract birth year from calendar year to get subject's age. (Keep the calendar year in the record, though, for checking and possible analysis.) You've requested and received answers about how to do the second with the employment history converted to 'wide' format using CASESTOVARS. Alternatively (and it's probably what I'd do myself) you can use XSAVE logic to create a long-form file with the gaps filled in from the SES of the last previous year with employment. (You'd actually do this before you do step 1.) Finally, filling in SES from other sources depends on what other sources you have. If you have, for example, a single parental SES, it's easy to create a record containing that SES for each year of the subject's life, and then to drop those records for the year of, and all years after, the first employment record. SES based on subject's education may be relevant only after the education is completed; you'll know best about that. But you'll end up with a 'long' file that has an SES, and the source of that SES (keep these), computed by methods standard in your field, for each year from the subject's birth to the last year for which they have an employment record. After that, decide what you want to do. If you're going to compare with health, I'd add the health by year to this long file, before converting to wide form. (Depending on analysis, you may or may not *ever* convert to wide form. You'll know that better than I, except I'm always recommending keeping data in 'long' form -- my taste, and advice that is often, but not invariably, right.) You wrote, >It would make more sense to index [and create by-decade summary >variables] by age (e.g. SES_16_25, SES_26-35). > >>You'd have to say what SES value to use for the decade: The >>highest? The mean of all values that occur in the decade? > >The mean of all values that occur in a decade, but I am open to suggestions. This also could be done in the long-form file (code not tested): STRING DECADE (A5). RECODE AGE (00 THRU 15 = '00_15') (16 THRU 25 = '16_25') ... (66 THRU HI = '66_up') INTO DECADE. DATASET DECLARE BY_Decade. AGGREGATE OUTFILE=BY_Decade BREAK=DECADE SES =MEAN(SES). The result would be suitable for putting into 'wide' form with CASESTOVARS, using an /INDEX=DECADE subcommand. This leaves plenty of questions, but I think it'll get you a step farther. -Best wishes, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks so much Richard, you're a credit to the List.
This gives me plenty to go on. Jack Dr. Jack Noone Research Fellow & LHH/ABBA Project Manager Ageing, Work and Health Research Unit Faculty of Health Sciences University of Sydney Ph: 02 9351 9411 On 7/12/12 7:23 AM, "Richard Ristow" <[hidden email]> wrote: >At 03:15 AM 12/1/2012, Jack Noone wrote: > >>We are examining the relationship between socioeconomic status and >>health across the life course. The aim is to create a socioeconomic >>trajectory starting from early childhood to late-middle age for each >>participant and then compare it with their health trajectory over >>the same period. > >Thank you! This is enough detail to go a good ways toward addressing >your problem. > >I'd written, > >>>Indexing values by calendar year gives very different comparisons >>>than does indexing by employment year (or, by the way, by year of >>>age). Which indexing to use depends critically on what questions >>>you want to answer. > >and you make clear that you eventually want the values indexed by >subject's calendar age. > >The code I posted previously gives you the occupation-based SES for >each subject for each calendar year covered by one of their >employment records. Given how you receive your data, it's best and >easiest to start with this step. That leaves, however, > >1. Converting calendar year to subject's age >2. Filling in gaps for years not covered by any employment record for >the subject >and (new problem) >3. Filling in parent-based or education-based SES for years before >the commencement of employment. > >The first is easy if you have all subjects' birth years: use MATCH >FILES to add the year of birth to each record, then subtract birth >year from calendar year to get subject's age. (Keep the calendar year >in the record, though, for checking and possible analysis.) > >You've requested and received answers about how to do the second with >the employment history converted to 'wide' format using CASESTOVARS. >Alternatively (and it's probably what I'd do myself) you can use >XSAVE logic to create a long-form file with the gaps filled in from >the SES of the last previous year with employment. (You'd actually do >this before you do step 1.) > >Finally, filling in SES from other sources depends on what other >sources you have. If you have, for example, a single parental SES, >it's easy to create a record containing that SES for each year of the >subject's life, and then to drop those records for the year of, and >all years after, the first employment record. SES based on subject's >education may be relevant only after the education is completed; >you'll know best about that. > >But you'll end up with a 'long' file that has an SES, and the source >of that SES (keep these), computed by methods standard in your field, >for each year from the subject's birth to the last year for which >they have an employment record. > >After that, decide what you want to do. If you're going to compare >with health, I'd add the health by year to this long file, before >converting to wide form. (Depending on analysis, you may or may not >*ever* convert to wide form. You'll know that better than I, except >I'm always recommending keeping data in 'long' form -- my taste, and >advice that is often, but not invariably, right.) > >You wrote, > >>It would make more sense to index [and create by-decade summary >>variables] by age (e.g. SES_16_25, SES_26-35). >> >>>You'd have to say what SES value to use for the decade: The >>>highest? The mean of all values that occur in the decade? >> >>The mean of all values that occur in a decade, but I am open to >>suggestions. > >This also could be done in the long-form file (code not tested): > >STRING DECADE (A5). >RECODE AGE > (00 THRU 15 = '00_15') > (16 THRU 25 = '16_25') > ... > (66 THRU HI = '66_up') > INTO DECADE. > >DATASET DECLARE BY_Decade. >AGGREGATE OUTFILE=BY_Decade > BREAK=DECADE > SES =MEAN(SES). > >The result would be suitable for putting into 'wide' form with >CASESTOVARS, using an /INDEX=DECADE subcommand. > >This leaves plenty of questions, but I think it'll get you a step farther. > >-Best wishes, > Richard > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Richard Ristow
Right on!
Basic concept? Normalization! --- "I'm always recommending keeping data in 'long' form -- my taste, and advice that is often, but not invariably, right.) "
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |