SPSSX Discussion

Syntax help - duplicating variables

Classic

List

Threaded

34 messages Options

Bruce Weaver

Re: Syntax help - duplicating variables -

Administrator

Well said, Thara. I concur. Although is style may sometimes be a bit robust, one can learn a lot from David's posts. I know I have.

Cheers,
Bruce

p.s. - I am writing from a right-side-up position. ;-)

Thara Vardhan wrote

Hi Jack

I have been following this discussion quite keenly.

It is a pity that you are reacting so strongly to David's response.

Just wanted to let you know that David is one of the most helpful and
knowledgeable persons on the list.

In fact he is guru of SPSS syntax in the real sense.

His comments/suggestions and help with syntax go beyond the initial
problem posted by members and thereby helps the person think more
carefully and come to the best possible solution and conclusion for the
issue they are working on.

Perhaps you are under a lot of stress right now.

Hopefully this will change your mind about not wanting to interact with
him anymore on this forum. Oh yes I am writing from down under!

cheers
thara vardhan

From: Jack Noone <[hidden email]>
To: [hidden email]
Date: 16/11/2012 11:35
Subject: Re: Syntax help - duplicating variables
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Dear David,

I find your language (e.g. RTFM) completely inappropriate for this forum.

I am doing my best to solve a complex problem within a tight time-frame
and with only limited knowledge of SPSS syntax.

I would appreciate it if you did not respond to any more of my posts
including this one.

Jack.

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 16/11/12 1:09 PM, "David Marso" <[hidden email]> wrote:

>Please RTFM re AGGREGATE rather than just blindly running donated code.
>ie MODE=ADDVARIABLES will be useful.
>--
>If Self Employed is a constant for P_ID and Year then you will get
>precisely
>the result required.
>"Their were no error messages but the sorting by Job_SES didn't run.
>HELP!"
>
>"didn't run" is *not *informative! What DID happen????
>Maybe post what did occur???
>
>Run the original AGG per RW using MODE=ADDVAR then do a SELECT IF for the
>MAX(SES).
>
>Point 2.
>See EXECUTE command (so the data pass is performed and the fills are
>populated).
>
>---
>
>Jack Noone wrote
>> Hi David and all,
>>
>> Point 1:
>>
>> Here is what the data looks like prior to AGG BREAK.
>>
>> P_ID year job yr_start yr_stop job_SES
>> self_employed
>> 1 1964 1 1964 1965 48.4
>>1
>> 1 1965 1 1964 1965 48.4
>>1
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1967 3 1967 1969 48.4
>>1
>> 1 1967 3 1967 1969 48.4
>>1
>> 1 1968 4 1968 1969 48.4
>>0
>> 1 1969 4 1968 1969 48.4
>>0
>> 1 1969 5 1969 1974 83.7
>>1
>> 1 1969 5 1969 1974 83.7
>>1
>>
>> And so on
>>
>>
>> However, we can see that people are holding more than one job in a
>> calendar year.
>> So I applied this syntax (℅ R.Ristow) with the aim to have only the
>> highest job_SES for any given year:
>>
>> AGGREGATE
>> /OUTFILE=*
>> /BREAK =P_ID year
>> /JinYear 'Number of jobs held in calendar year'=NU
>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>
>> Which resulted in
>>
>> P_ID year jinyear job_SES
>> 1 1964 1 48.4
>> 1 1965 2 48.4
>> 1 1966 1 48.4
>> 1 1967 2 48.4
>> 1 1968 2 48.4
>> 1 1969 2 83.7
>> 1 1970 1 83.7
>> 1 1971 1 83.7
>>
>> Perfect! In 1969, this participant held one job with a SES rating of
>>48.4
>> and one with SES rating of 83.7. However, only the higher rating SES
>>value
>> is chosen for 1969.
>>
>> However, I want to know if the person was self-employed for the job
that
>> has been selected. So I tried this:
>>
>> AGGREGATE
>> /OUTFILE=*
>> /BREAK =P_ID year self_employed
>> /JinYear 'Number of jobs held in calendar year'=NU
>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>
>> But it didn't work. Their were no error messages but the sorting by
>> Job_SES didn't run. HELP!
>>
>>
>> Point 2:
>>
>> I converted the long format file to wide format so I could take a look
>>at
>> the missing data.
>>
>> I then applied this syntax after sorting the variables
>>
>> VECTOR V=Job_ses_1 TO Job_SES_55.
>> LOOP #=2 TO 10.
>> IF MISSING(V(#)) V(#)=V(#-1),
>> END LOOP.
>>
>> There were no errors but the missing data were not filled.
>>
>>
>>
>>
>> Sadly I don't have the knowledge to solve this one myself. HELP AGAIN!
>>
>> Jack
>>
>>
>>
>> Dr. Jack Noone
>> Research Fellow & LHH/ABBA Project Manager
>> Ageing, Work and Health Research Unit
>> Faculty of Health Sciences
>> University of Sydney
>>
>> Ph: 02 9351 9411
>>
>>
>>
>>
>>
>> On 16/11/12 1:23 AM, "David Marso" <
>
>> david.marso@
>
>> > wrote:
>>
>>>Your initial followup:
>>>"I need to keep some other variables as well and I can't seem to figure
>>>out
>>>how to do it."
>>>If the values of these additional variables vary over year then you
need
>>>to
>>>specify how these new variables will be represented in the new data
>>>file.
>>>If they don't vary then everything should be exactly as if they were
not
>>>used in the AGG BREAK. Maybe time for you to post what the
before/after
>>>(pre aggregate/post aggregated) data appear.
>>>
>>>Point 2:
>>>Data x1...x10
>>>1 1 . . 3 4 5
>>>-----
>>>VECTOR V=V1 TO V10.
>>>LOOP #=2 TO 10.
>>>IF MISSING(V(#)) V(#)=V(#-1),
>>>END LOOP.
>>>----------------
>>>
>>>Jack Noone wrote
>>>> Hi David,
>>>>
>>>> Thanks for your help, but unfortunately the syntax didn't work as I'd
>>>> hoped.
>>>>
>>>> I believe the context for the problem is in thread below. But,
>>>>according
>>>> to point number 2 (see bottom of thread), the original syntax was
>>>>designed
>>>> to "automatically select
>>>> the highest SES job for any one year" and it did this perfectly. Some
>>>> people had more than one job in a calendar year and I wanted to
select
>>>>the
>>>> job with the highest socioeconomic rating.
>>>>
>>>> But, if I add the other variables under the break command as
>>>>suggested,
>>>> then the highest SES job for any one year is not selected out. I
>>>>thought
>>>> that they were constant within the structure, but I now suspect that
I
>>>> didn't understand what you meant. Could you elaborate please?
>>>>
>>>> I also have a another, somewhat related, syntax query. Having
>>>>converted
>>>>my
>>>> long file to wide I end up with a file looking like this:
>>>>
>>>> P_ID ses_yr1 ses_yr2 ses_yr3 ses_yr4 ses_yr5 ses_yr6 .
>>>> 1 34 34 . . 48 48
>>>> 2 48 48 48 75 75 75
>>>>
>>>> This is simply a occupation-based socioeconomic index for each year
of
>>>>my
>>>> participants' working lives - exactly what I wanted. However, I need
>>>>to
>>>> fill in the missing data by substituting in the last SES score. For
>>>> example, participant 1 was out of the workforce for year 3 and year 4
>>>>and
>>>> I would like to substitute in their SES score of 34 (from their last
>>>>job)
>>>> for the two points of missing data.
>>>>
>>>> I'm sure there is an easy way to do this, but I have no idea how.
>>>>
>>>> Thanks,
>>>>
>>>> Jack
>>>>
>>>>
>>>>
>>>> Dr. Jack Noone
>>>> Research Fellow & LHH/ABBA Project Manager
>>>> Ageing, Work and Health Research Unit
>>>> Faculty of Health Sciences
>>>> University of Sydney
>>>>
>>>> Ph: 02 9351 9411
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 14/11/12 5:39 PM, "David Marso" <
>>>
>>>> david.marso@
>>>
>>>> > wrote:
>>>>
>>>>>Without reviewing the entire thread:
>>>>>*If* p_id to AUSEI06_3digit are *CONSTANT *within the structure
>>>>>(P_ID *
>>>>>year) simply add them to the list of BREAKS
>>>>>ie
>>>>>AGGREGATE
>>>>> /OUTFILE=*
>>>>> /BREAK =P_ID to AUSEI06_3digit year
>>>>> /JinYear 'Number of jobs held in calendar year'=NU
>>>>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>>>>
>>>>>*Otherwise *your question requires greater specificity.
>>>>>
>>>>>
>>>>>Jack Noone wrote
>>>>>> Hi Richard,
>>>>>>
>>>>>> You may remember the thread below. The syntax you wrote was
perfect,
>>>>>> however I need to keep some other variables as well and I can't
seem
>>>>>>to
>>>>>> figure out how to do it. Here is the piece of syntax in question.
>>>>>>
>>>>>>
>>>>>> GET FILE="/Users/jacknoone/desktop/expanded file.sav".
>>>>>> AGGREGATE
>>>>>> /OUTFILE=*
>>>>>> /BREAK =P_ID year
>>>>>> /JinYear 'Number of jobs held in calendar year'=NU
>>>>>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>>>>>
>>>>>> So, how would I fit
>>>>>> keep = p_id to AUSEI06_3digit.
>>>>>> into the syntax above
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jack
>>>>>>
>>>>>>
>>>>>> Dr. Jack Noone
>>>>>> Research Fellow & LHH/ABBA Project Manager
>>>>>> Ageing, Work and Health Research Unit
>>>>>> Faculty of Health Sciences
>>>>>> University of Sydney
>>>>>>
>>>>>> Ph: 02 9351 9411
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>> <SNIP>
>>>
>>>
>>>
>>>
>>>
>>>-----
>>>Please reply to the list and not to my personal email.
>>>Those desiring my consulting or training services please feel free to
>>>email me.
>>>--
>>>View this message in context:
>>>
http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-va
>>>ri
>>>ables-tp5715562p5716214.html
>>>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>>=====================
>>>To manage your subscription to SPSSX-L, send a message to
>>>
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>>>command. To leave the list, send the command
>>>SIGNOFF SPSSX-L
>>>For a list of commands to manage subscriptions, send the command
>>>INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
>-----
>Please reply to the list and not to my personal email.
>Those desiring my consulting or training services please feel free to
>email me.
>--
>View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-vari
>ables-tp5715562p5716240.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _

All mail is subject to content scanning for possible violation of NSW
Police
Force policy, including the Email and Internet Policy and Guidelines. All
NSW
Police Force employees are required to familiarise themselves with these
policies, available on the NSW Police Force Intranet.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information contained in this email is intended for the named recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information. If you are not the intended recipient or you have received this
email by mistake, please reply to the author and delete this email immediately.
You must not copy, print, forward or distribute this email, nor place reliance
on its contents. This email and any attachment have been virus scanned. However,
you are requested to conduct a virus scan as well. No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a delay
or defect in transmission of this email or any attached file. This email does not
constitute a representation by the NSW Police Force unless the author is legally
entitled to do so.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

David Marso

Re: Syntax help - duplicating variables -

Administrator

Thanks Thara and Bruce.
I suppose my clue stick could use a bit more padding, especially for folks with a thin skin.
;-)
--

Bruce Weaver wrote

Well said, Thara. I concur. Although is style may sometimes be a bit robust, one can learn a lot from David's posts. I know I have.

Cheers,
Bruce

p.s. - I am writing from a right-side-up position. ;-)

Thara Vardhan wrote

Hi Jack

I have been following this discussion quite keenly.

It is a pity that you are reacting so strongly to David's response.

Just wanted to let you know that David is one of the most helpful and
knowledgeable persons on the list.

In fact he is guru of SPSS syntax in the real sense.

His comments/suggestions and help with syntax go beyond the initial
problem posted by members and thereby helps the person think more
carefully and come to the best possible solution and conclusion for the
issue they are working on.

Perhaps you are under a lot of stress right now.

Hopefully this will change your mind about not wanting to interact with
him anymore on this forum. Oh yes I am writing from down under!

cheers
thara vardhan

From: Jack Noone <[hidden email]>
To: [hidden email]
Date: 16/11/2012 11:35
Subject: Re: Syntax help - duplicating variables
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Dear David,

I find your language (e.g. RTFM) completely inappropriate for this forum.

I am doing my best to solve a complex problem within a tight time-frame
and with only limited knowledge of SPSS syntax.

I would appreciate it if you did not respond to any more of my posts
including this one.

Jack.

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 16/11/12 1:09 PM, "David Marso" <[hidden email]> wrote:

>Please RTFM re AGGREGATE rather than just blindly running donated code.
>ie MODE=ADDVARIABLES will be useful.
>--
>If Self Employed is a constant for P_ID and Year then you will get
>precisely
>the result required.
>"Their were no error messages but the sorting by Job_SES didn't run.
>HELP!"
>
>"didn't run" is *not *informative! What DID happen????
>Maybe post what did occur???
>
>Run the original AGG per RW using MODE=ADDVAR then do a SELECT IF for the
>MAX(SES).
>
>Point 2.
>See EXECUTE command (so the data pass is performed and the fills are
>populated).
>
>---
>
>Jack Noone wrote
>> Hi David and all,
>>
>> Point 1:
>>
>> Here is what the data looks like prior to AGG BREAK.
>>
>> P_ID year job yr_start yr_stop job_SES
>> self_employed
>> 1 1964 1 1964 1965 48.4
>>1
>> 1 1965 1 1964 1965 48.4
>>1
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1965 2 1965 1967 48.4
>>0
>> 1 1967 3 1967 1969 48.4
>>1
>> 1 1967 3 1967 1969 48.4
>>1
>> 1 1968 4 1968 1969 48.4
>>0
>> 1 1969 4 1968 1969 48.4
>>0
>> 1 1969 5 1969 1974 83.7
>>1
>> 1 1969 5 1969 1974 83.7
>>1
>>
>> And so on
>>
>>
>> However, we can see that people are holding more than one job in a
>> calendar year.
>> So I applied this syntax (℅ R.Ristow) with the aim to have only the
>> highest job_SES for any given year:
>>
>> AGGREGATE
>> /OUTFILE=*
>> /BREAK =P_ID year
>> /JinYear 'Number of jobs held in calendar year'=NU
>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>
>> Which resulted in
>>
>> P_ID year jinyear job_SES
>> 1 1964 1 48.4
>> 1 1965 2 48.4
>> 1 1966 1 48.4
>> 1 1967 2 48.4
>> 1 1968 2 48.4
>> 1 1969 2 83.7
>> 1 1970 1 83.7
>> 1 1971 1 83.7
>>
>> Perfect! In 1969, this participant held one job with a SES rating of
>>48.4
>> and one with SES rating of 83.7. However, only the higher rating SES
>>value
>> is chosen for 1969.
>>
>> However, I want to know if the person was self-employed for the job
that
>> has been selected. So I tried this:
>>
>> AGGREGATE
>> /OUTFILE=*
>> /BREAK =P_ID year self_employed
>> /JinYear 'Number of jobs held in calendar year'=NU
>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>
>> But it didn't work. Their were no error messages but the sorting by
>> Job_SES didn't run. HELP!
>>
>>
>> Point 2:
>>
>> I converted the long format file to wide format so I could take a look
>>at
>> the missing data.
>>
>> I then applied this syntax after sorting the variables
>>
>> VECTOR V=Job_ses_1 TO Job_SES_55.
>> LOOP #=2 TO 10.
>> IF MISSING(V(#)) V(#)=V(#-1),
>> END LOOP.
>>
>> There were no errors but the missing data were not filled.
>>
>>
>>
>>
>> Sadly I don't have the knowledge to solve this one myself. HELP AGAIN!
>>
>> Jack
>>
>>
>>
>> Dr. Jack Noone
>> Research Fellow & LHH/ABBA Project Manager
>> Ageing, Work and Health Research Unit
>> Faculty of Health Sciences
>> University of Sydney
>>
>> Ph: 02 9351 9411
>>
>>
>>
>>
>>
>> On 16/11/12 1:23 AM, "David Marso" <
>
>> david.marso@
>
>> > wrote:
>>
>>>Your initial followup:
>>>"I need to keep some other variables as well and I can't seem to figure
>>>out
>>>how to do it."
>>>If the values of these additional variables vary over year then you
need
>>>to
>>>specify how these new variables will be represented in the new data
>>>file.
>>>If they don't vary then everything should be exactly as if they were
not
>>>used in the AGG BREAK. Maybe time for you to post what the
before/after
>>>(pre aggregate/post aggregated) data appear.
>>>
>>>Point 2:
>>>Data x1...x10
>>>1 1 . . 3 4 5
>>>-----
>>>VECTOR V=V1 TO V10.
>>>LOOP #=2 TO 10.
>>>IF MISSING(V(#)) V(#)=V(#-1),
>>>END LOOP.
>>>----------------
>>>
>>>Jack Noone wrote
>>>> Hi David,
>>>>
>>>> Thanks for your help, but unfortunately the syntax didn't work as I'd
>>>> hoped.
>>>>
>>>> I believe the context for the problem is in thread below. But,
>>>>according
>>>> to point number 2 (see bottom of thread), the original syntax was
>>>>designed
>>>> to "automatically select
>>>> the highest SES job for any one year" and it did this perfectly. Some
>>>> people had more than one job in a calendar year and I wanted to
select
>>>>the
>>>> job with the highest socioeconomic rating.
>>>>
>>>> But, if I add the other variables under the break command as
>>>>suggested,
>>>> then the highest SES job for any one year is not selected out. I
>>>>thought
>>>> that they were constant within the structure, but I now suspect that
I
>>>> didn't understand what you meant. Could you elaborate please?
>>>>
>>>> I also have a another, somewhat related, syntax query. Having
>>>>converted
>>>>my
>>>> long file to wide I end up with a file looking like this:
>>>>
>>>> P_ID ses_yr1 ses_yr2 ses_yr3 ses_yr4 ses_yr5 ses_yr6 .
>>>> 1 34 34 . . 48 48
>>>> 2 48 48 48 75 75 75
>>>>
>>>> This is simply a occupation-based socioeconomic index for each year
of
>>>>my
>>>> participants' working lives - exactly what I wanted. However, I need
>>>>to
>>>> fill in the missing data by substituting in the last SES score. For
>>>> example, participant 1 was out of the workforce for year 3 and year 4
>>>>and
>>>> I would like to substitute in their SES score of 34 (from their last
>>>>job)
>>>> for the two points of missing data.
>>>>
>>>> I'm sure there is an easy way to do this, but I have no idea how.
>>>>
>>>> Thanks,
>>>>
>>>> Jack
>>>>
>>>>
>>>>
>>>> Dr. Jack Noone
>>>> Research Fellow & LHH/ABBA Project Manager
>>>> Ageing, Work and Health Research Unit
>>>> Faculty of Health Sciences
>>>> University of Sydney
>>>>
>>>> Ph: 02 9351 9411
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 14/11/12 5:39 PM, "David Marso" <
>>>
>>>> david.marso@
>>>
>>>> > wrote:
>>>>
>>>>>Without reviewing the entire thread:
>>>>>*If* p_id to AUSEI06_3digit are *CONSTANT *within the structure
>>>>>(P_ID *
>>>>>year) simply add them to the list of BREAKS
>>>>>ie
>>>>>AGGREGATE
>>>>> /OUTFILE=*
>>>>> /BREAK =P_ID to AUSEI06_3digit year
>>>>> /JinYear 'Number of jobs held in calendar year'=NU
>>>>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>>>>
>>>>>*Otherwise *your question requires greater specificity.
>>>>>
>>>>>
>>>>>Jack Noone wrote
>>>>>> Hi Richard,
>>>>>>
>>>>>> You may remember the thread below. The syntax you wrote was
perfect,
>>>>>> however I need to keep some other variables as well and I can't
seem
>>>>>>to
>>>>>> figure out how to do it. Here is the piece of syntax in question.
>>>>>>
>>>>>>
>>>>>> GET FILE="/Users/jacknoone/desktop/expanded file.sav".
>>>>>> AGGREGATE
>>>>>> /OUTFILE=*
>>>>>> /BREAK =P_ID year
>>>>>> /JinYear 'Number of jobs held in calendar year'=NU
>>>>>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>>>>>
>>>>>> So, how would I fit
>>>>>> keep = p_id to AUSEI06_3digit.
>>>>>> into the syntax above
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jack
>>>>>>
>>>>>>
>>>>>> Dr. Jack Noone
>>>>>> Research Fellow & LHH/ABBA Project Manager
>>>>>> Ageing, Work and Health Research Unit
>>>>>> Faculty of Health Sciences
>>>>>> University of Sydney
>>>>>>
>>>>>> Ph: 02 9351 9411
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>> <SNIP>
>>>
>>>
>>>
>>>
>>>
>>>-----
>>>Please reply to the list and not to my personal email.
>>>Those desiring my consulting or training services please feel free to
>>>email me.
>>>--
>>>View this message in context:
>>>
http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-va
>>>ri
>>>ables-tp5715562p5716214.html
>>>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>>=====================
>>>To manage your subscription to SPSSX-L, send a message to
>>>
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>>>command. To leave the list, send the command
>>>SIGNOFF SPSSX-L
>>>For a list of commands to manage subscriptions, send the command
>>>INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
>-----
>Please reply to the list and not to my personal email.
>Those desiring my consulting or training services please feel free to
>email me.
>--
>View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-vari
>ables-tp5715562p5716240.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _

All mail is subject to content scanning for possible violation of NSW
Police
Force policy, including the Email and Internet Policy and Guidelines. All
NSW
Police Force employees are required to familiarise themselves with these
policies, available on the NSW Police Force Intranet.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information contained in this email is intended for the named recipient(s)
only. It may contain private, confidential, copyright or legally privileged
information. If you are not the intended recipient or you have received this
email by mistake, please reply to the author and delete this email immediately.
You must not copy, print, forward or distribute this email, nor place reliance
on its contents. This email and any attachment have been virus scanned. However,
you are requested to conduct a virus scan as well. No liability is accepted
for any loss or damage resulting from a computer virus, or resulting from a delay
or defect in transmission of this email or any attached file. This email does not
constitute a representation by the NSW Police Force unless the author is legally
entitled to do so.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Richard Ristow

Re: Syntax help - duplicating variables

In reply to this post by Jack Noone

At 11:04 PM 11/13/2012, Jack Noone wrote:

>You may remember the thread below. The syntax you wrote was perfect,
>however I need to keep some other variables as well and I can't seem
>to figure out how to do it. Here is the piece of syntax in question.
>
>GET FILE="/Users/jacknoone/desktop/expanded file.sav".
>AGGREGATE
> /OUTFILE=*
> /BREAK =P_ID year
> /JinYear 'Number of jobs held in calendar year'=NU
> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>
>And at 05:58 PM 11/15/2012, Jack Noone wrote:
>Here is what the data looks like prior to AGG BREAK.
>
>P_ID year job yr_start yr_stop job_SES self_employed
>1 1964 1 1964 1965 48.4 1
>1 1965 1 1964 1965 48.4 1
>1 1965 2 1965 1967 48.4 0
>1 1965 2 1965 1967 48.4 0
>1 1965 2 1965 1967 48.4 0
>1 1967 3 1967 1969 48.4 1
>1 1967 3 1967 1969 48.4 1
>1 1968 4 1968 1969 48.4 0
>1 1969 4 1968 1969 48.4 0
>1 1969 5 1969 1974 83.7 1
>1 1969 5 1969 1974 83.7 1

I thought this might come up: you want not just the highest SES for
the year, but some other values from the job for which the SES was
highest. That's a little less elegant, and I'm not testing this code,
but see if this works:

loop year = year_start TO year_end.
. xsave
outfile = "/Users/jacknoone/desktop/expanded file.sav" /
keep = P_ID year Job_number Job_SES self_employed.
end loop.
execute.

This is the same as the previous XSAVE loop, *except* adding
'self-employed' to the KEEP list. Add any other variables you want to
keep, as well. Now for the AGGREGATE, and here's where it gets a
little less elegant. Note the SORT CASES, and the use of FIRST
instead of MAX for Job_SES (and 'self_employed').

GET FILE="/Users/jacknoone/desktop/expanded file.sav".
SORT CASES BY P_ID (A) year (A) Job_SES (D).
AGGREGATE
/OUTFILE=*
/BREAK =P_ID year
/JinYear 'Number of jobs held in calendar year'
=NU
/Job_SES 'Highest job SES in calendar year'
=FIRST(Job_SES)
/self_employed 'Self-employed in that job?'
=FIRST(self_employed).

Contrary to one piece of advice, you definitely do not want
MODE=ADDVARIABLES. That mode gives you one output record for each
input record; you want to summarize to one output record per year of
employment.

And, again, I'm afraid this is not tested.

-Best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Syntax help - duplicating variables

In reply to this post by David Marso

At 12:36 PM 11/16/2012, David Marso wrote:

>Thanks Thara and Bruce.
>I suppose my clue stick could use a bit more padding, especially for folks
>with a thin skin.

Hard to see what harm it could do; especially, since being in the
position of asking for help, often enough under time or boss or other
pressure, often thins the skin a bit.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jack Noone

Re: Syntax help - duplicating variables

In reply to this post by Richard Ristow

Hi Richard,

The syntax worked perfectly! Thanks so much for that - hugely appreciated.

I wonder if you would be able to help me with the other problem I was
having with missing variables?

Firstly, I converted the Long file to wide as so:

SORT CASES BY P_ID .
CASESTOVARS
/ID=P_ID
/GROUPBY=INDEX.

And after sorting the variables by name I end up with something like

P_ID Job_SES.1 job_SES.2 job_ses.3 job_ses.4 Job_ses.55
1 45.6 . . 50.7 64.4
2 34.2 34.2 34.2 60.4 60.4

For participant 1 (and any other of course), I need to substitute the
missing values in job_SES.2 and job_ses.3 for the SES value in their last
job. In other words, if the person is out of work for a year, then I
substitute in the SES value from their previous job.

David suggested the following syntax, which "appears" to run (I.e. The
SPSS output says it has worked). However, the missing data has not been
filled in the dataset. I've tried this on another computer as well with
the same result.

VECTOR V=job_ses.1 TO job_ses.55.
LOOP #=2 TO 10.
IF MISSING(V(#)) V(#)=V(#-1).
END LOOP.
Execute.

Any ideas? Any alternative?

Please note that this is part of a separate research question and doesn't
require data on self-employment etc.

Cheers,

Jack

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 19/11/12 10:21 AM, "Richard Ristow" <[hidden email]> wrote:

>At 11:04 PM 11/13/2012, Jack Noone wrote:
>
>>You may remember the thread below. The syntax you wrote was perfect,
>>however I need to keep some other variables as well and I can't seem
>>to figure out how to do it. Here is the piece of syntax in question.
>>
>>GET FILE="/Users/jacknoone/desktop/expanded file.sav".
>>AGGREGATE
>> /OUTFILE=*
>> /BREAK =P_ID year
>> /JinYear 'Number of jobs held in calendar year'=NU
>> /Job_SES 'Highest job SES in calendar year' =MAX(Job_SES).
>>
>>And at 05:58 PM 11/15/2012, Jack Noone wrote:
>>Here is what the data looks like prior to AGG BREAK.
>>
>>P_ID year job yr_start yr_stop job_SES self_employed
>>1 1964 1 1964 1965 48.4 1
>>1 1965 1 1964 1965 48.4 1
>>1 1965 2 1965 1967 48.4 0
>>1 1965 2 1965 1967 48.4 0
>>1 1965 2 1965 1967 48.4 0
>>1 1967 3 1967 1969 48.4 1
>>1 1967 3 1967 1969 48.4 1
>>1 1968 4 1968 1969 48.4 0
>>1 1969 4 1968 1969 48.4 0
>>1 1969 5 1969 1974 83.7 1
>>1 1969 5 1969 1974 83.7 1
>
>I thought this might come up: you want not just the highest SES for
>the year, but some other values from the job for which the SES was
>highest. That's a little less elegant, and I'm not testing this code,
>but see if this works:
>
>loop year = year_start TO year_end.
>. xsave
> outfile = "/Users/jacknoone/desktop/expanded file.sav" /
> keep = P_ID year Job_number Job_SES self_employed.
>end loop.
>execute.
>
>This is the same as the previous XSAVE loop, *except* adding
>'self-employed' to the KEEP list. Add any other variables you want to
>keep, as well. Now for the AGGREGATE, and here's where it gets a
>little less elegant. Note the SORT CASES, and the use of FIRST
>instead of MAX for Job_SES (and 'self_employed').
>
>GET FILE="/Users/jacknoone/desktop/expanded file.sav".
>SORT CASES BY P_ID (A) year (A) Job_SES (D).
>AGGREGATE
> /OUTFILE=*
> /BREAK =P_ID year
> /JinYear 'Number of jobs held in calendar year'
> =NU
> /Job_SES 'Highest job SES in calendar year'
> =FIRST(Job_SES)
> /self_employed 'Self-employed in that job?'
> =FIRST(self_employed).
>
>Contrary to one piece of advice, you definitely do not want
>MODE=ADDVARIABLES. That mode gives you one output record for each
>input record; you want to summarize to one output record per year of
>employment.
>
>And, again, I'm afraid this is not tested.
>
>-Best wishes,
> Richard
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Marso

Re: Syntax help - duplicating variables

Administrator

What did I previously say about blindly running donated code?
------
David suggested the following syntax, which "appears" to run (I.e. The
SPSS output says it has worked). However, the missing data has not been
filled in the dataset. I've tried this on another computer as well with
the same result.

VECTOR V=job_ses.1 TO job_ses.*55*.
LOOP #=2 TO *10*.
IF MISSING(V(#)) V(#)=V(#-1).
END LOOP.
Execute.

Jack Noone

Re: Syntax help - duplicating variables

*Hangs head in shame*

Fond regards :)

Jack

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 19/11/12 6:21 PM, "David Marso" <[hidden email]> wrote:

>What did I previously say about blindly running donated code?
>------
>David suggested the following syntax, which "appears" to run (I.e. The
>SPSS output says it has worked). However, the missing data has not been
>filled in the dataset. I've tried this on another computer as well with
>the same result.
>
>VECTOR V=job_ses.1 TO job_ses.**55**.
>LOOP #=2 TO **10**.
>IF MISSING(V(#)) V(#)=V(#-1).
>END LOOP.
>Execute.
>
>
>
>-----
>Please reply to the list and not to my personal email.
>Those desiring my consulting or training services please feel free to
>email me.
>--
>View this message in context:
>http://spssx-discussion.1045642.n5.nabble.com/Syntax-help-duplicating-vari
>ables-tp5715562p5716300.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

Jack Noone

Re: Syntax help - duplicating variables

In reply to this post by Richard Ristow

Hello again - Ive struck *another* problem I hope you can help with.

I've only now realized that people started their first job at different
times ranging from 1954 to 1985. So creating a SES indicator for each year
of their working life won't allow me to to create an equal number of time
points across particpants (for instance one person may have been in work
for for 20 years while another has been working for 40).

To solve this problem, I would like to create a list of variables to
reflect average SES over particular points in time:

P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000 SES_2001-2011

If a participant has has not entered the workforce by 1970, I would
substitute in a SES score according to their education (happy to share the
ref if anyone is interested). I'm really not sure about the best way to do
this.

Just as a reminder, my Long data file had the following variables
(AUSEI06_2 = job_ses):

P_ID year jinyear AUSEI06_2 self_employ

Any ideas? I could cope if we couldn't include the self_employ variable.

Thanks,

Jack

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 19/11/12 10:21 AM, "Richard Ristow" <[hidden email]> wrote:

Automatic reply: Syntax help - duplicating variables

New Mexico state offices are closed until Monday November 26.

Richard Ristow

Re: Syntax help - duplicating variables

In reply to this post by Jack Noone

At 08:05 PM 11/21/2012, Jack Noone asked about creating a 'wide' data
file in which there's a record for each subject, with a summary
variable for each calendar decade (see below). I'm not comfortable
answering that; this is your study, not mine, but I'm wondering if
the results of doing so would be meaningful.

You wrote:

>People [in my study] started their first job at different times
>ranging from 1954 to 1985. So creating a SES indicator for each year
>of their working life won't allow me to create an equal number of
>time points across participants (for instance one person may have
>been in work for 20 years while another has been working for 40).

This raises questions, not about using SPSS, but about your study and
how you view it. You haven't said what questions you're trying to
answer; that's OK, when you start with a specific data-handling
question, but at this point, it will help us if you do say what
you're looking for.

Your data structure (by which I mean, the underlying organization,
not how it happens to be structured in files) is a set of subjects,
individuals, for each of whom you have information over a time series
-- years of employment. The lengths of the time series vary among
your subjects.

Further, there are at least two meaningful ways to index the years
within an employment history: by calendar year, and by year within
each subject's employment history. Since you're putting your data in
a 'wide' organization, I'll guess that you're going to do comparisons
across subjects. In those comparisons, indexing values by calendar
year gives very different comparisons than does indexing by
employment year (or, by the way, by year of age). Which indexing to
use depends critically on what questions you want to answer.

You wrote,

>Creating a SES indicator for each year of their working life won't
>allow me to create an equal number of time points across participants".

No, it won't; and this is inherent in your data, not something that
can be solved in SPSS. You said,

>To solve this problem, I would like to create a list of variables to
>reflect average SES over particular points in time:
>
>P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000 SES_2001-2011

It wouldn't be too hard to create these variables that summarize by
decade, though you'd have to say what SES value to use for the
decade: The highest? The mean of all values that occur in the decade?

But I don't see that even this would give you time series all of the
same length; for participants who hadn't worked before 1970, or
hadn't after 2001, you'd have some missing values.

>If a participant has has not entered the workforce by 1970, I would
>substitute in a SES score according to their education (happy to
>share the ref if anyone is interested). I'm really not sure about
>the best way to do this.

It sounds like this is an accepted practice in your field. Fine, if
it is; from an outside perspective, it sounds doubtful -- are SES
from education and from employment level really the same thing?

And, is it really important that all the subjects have time series of
the same length? Many analytic methods, for example mixed linear
models, don't require that. And when your time series inherently
differ in length, and starting and ending period, making them all the
same length with a data transformation seems a doubtful idea to me --
admittedly, not in your field; indeed, not even knowing what your field is.

Forgive my going afield, but for the reasons I've given, I'm not
comfortable answering your last question simply on its own terms.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jack Noone

Re: Syntax help - duplicating variables

Hi Richard, hi all.

These are all good points you raise and I've responded to them below.

On 27/11/12 11:26 AM, "Richard Ristow" <[hidden email]> wrote:

>At 08:05 PM 11/21/2012, Jack Noone asked about creating a 'wide' data
>file in which there's a record for each subject, with a summary
>variable for each calendar decade (see below). I'm not comfortable
>answering that; this is your study, not mine, but I'm wondering if
>the results of doing so would be meaningful.
>
>You wrote:
>
>>People [in my study] started their first job at different times
>>ranging from 1954 to 1985. So creating a SES indicator for each year
>>of their working life won't allow me to create an equal number of
>>time points across participants (for instance one person may have
>>been in work for 20 years while another has been working for 40).
>
>This raises questions, not about using SPSS, but about your study and
>how you view it. You haven't said what questions you're trying to
>answer; that's OK, when you start with a specific data-handling
>question, but at this point, it will help us if you do say what
>you're looking for.

Certainly, although I am not at liberty to discuss the project in too much
detail.

We are examining the relationship between socioeconomic status and health
across the life course. The aim is to create a socioeconomic trajectory
starting from early childhood to late-middle age for each participant and
then compare it with their health trajectory over the same period.

>
>Your data structure (by which I mean, the underlying organization,
>not how it happens to be structured in files) is a set of subjects,
>individuals, for each of whom you have information over a time series
>-- years of employment. The lengths of the time series vary among
>your subjects.

Correct. To measure the participants' SES in early childhood, we have the
choice of using father's occupation, Mother's occupation, and/or a number
of other proxies.
>
>Further, there are at least two meaningful ways to index the years
>within an employment history: by calendar year, and by year within
>each subject's employment history. Since you're putting your data in
>a 'wide' organization, I'll guess that you're going to do comparisons
>across subjects.

Across and within - probably using Latent Class Growth Analysis

>In those comparisons, indexing values by calendar
>year gives very different comparisons than does indexing by
>employment year (or, by the way, by year of age). Which indexing to
>use depends critically on what questions you want to answer.

Very good point. As we are looking at SES across the life course, creating
an SES index according to age seems like the best option to me (not
according to decade as I state below - that was not a good idea). The
participants' ages are similar enough (60-64) to control for age effects.
>
>You wrote,
>
>>Creating a SES indicator for each year of their working life won't
>>allow me to create an equal number of time points across participants".
>
>No, it won't; and this is inherent in your data, not something that
>can be solved in SPSS.

Completely understood.

>You said,
>
>>To solve this problem, I would like to create a list of variables to
>>reflect average SES over particular points in time:
>>
>>P_ID SES_1960_1970 SES_1971_1980 SES_1981_1990 SES_1991_2000
>>SES_2001-2011

As noted above, It would make more sense to index by age (e.g. SES_16_25,
SES_26-35).

>
>It wouldn't be too hard to create these variables that summarize by
>decade, though you'd have to say what SES value to use for the
>decade: The highest? The mean of all values that occur in the decade?

The mean of all values that occur in a decade, but I am open to
suggestions.

>
>But I don't see that even this would give you time series all of the
>same length; for participants who hadn't worked before 1970, or
>hadn't after 2001, you'd have some missing values.

True,
>
>>If a participant has has not entered the workforce by 1970, I would
>>substitute in a SES score according to their education (happy to
>>share the ref if anyone is interested). I'm really not sure about
>>the best way to do this.
>
>It sounds like this is an accepted practice in your field. Fine, if
>it is; from an outside perspective, it sounds doubtful -- are SES
>from education and from employment level really the same thing?

The most accepted way of measuring SES depends on how old the person is.
To measure childhood SES we usually look at the parents' occupation or
education. To measure SES in the late teens and early 20s, education is
probably a good option. Through adulthood, one may look at personal
income, household income, and occupation etc. Of course, they are all just
proxies for SES with neither one capturing the complexities of the
contract on it's own. So it makes sense to me to use education as a proxy
for SES at age 16-25, if they have not yet entered the workforce (or if
they entered the workforce at say, age 20). Another alternative would be
to use the participants' partner's occupation as a proxy for their own SES
(assuming they have a partner of course), but I don't have access to this
information.

>
>And, is it really important that all the subjects have time series of
>the same length? Many analytic methods, for example mixed linear
>models, don't require that. And when your time series inherently
>differ in length, and starting and ending period, making them all the
>same length with a data transformation seems a doubtful idea to me --
>admittedly, not in your field; indeed, not even knowing what your field
>is.

I think if the SES trajectory was based on aged, the trajectories will all
be the same length. Well, give or take four years seeing as our
participants' ages range from 60-64
>
>Forgive my going afield, but for the reasons I've given, I'm not
>comfortable answering your last question simply on its own terms.

Not a worry. I appreciate your help so much and I understand why you
needed more information.

I'm also happy to provide more information as well!

Regards,

Jack
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Syntax help - duplicating variables

At 03:15 AM 12/1/2012, Jack Noone wrote:

>We are examining the relationship between socioeconomic status and
>health across the life course. The aim is to create a socioeconomic
>trajectory starting from early childhood to late-middle age for each
>participant and then compare it with their health trajectory over
>the same period.

Thank you! This is enough detail to go a good ways toward addressing
your problem.

I'd written,

>>Indexing values by calendar year gives very different comparisons
>>than does indexing by employment year (or, by the way, by year of
>>age). Which indexing to use depends critically on what questions
>>you want to answer.

and you make clear that you eventually want the values indexed by
subject's calendar age.

The code I posted previously gives you the occupation-based SES for
each subject for each calendar year covered by one of their
employment records. Given how you receive your data, it's best and
easiest to start with this step. That leaves, however,

1. Converting calendar year to subject's age
2. Filling in gaps for years not covered by any employment record for
the subject
and (new problem)
3. Filling in parent-based or education-based SES for years before
the commencement of employment.

The first is easy if you have all subjects' birth years: use MATCH
FILES to add the year of birth to each record, then subtract birth
year from calendar year to get subject's age. (Keep the calendar year
in the record, though, for checking and possible analysis.)

You've requested and received answers about how to do the second with
the employment history converted to 'wide' format using CASESTOVARS.
Alternatively (and it's probably what I'd do myself) you can use
XSAVE logic to create a long-form file with the gaps filled in from
the SES of the last previous year with employment. (You'd actually do
this before you do step 1.)

Finally, filling in SES from other sources depends on what other
sources you have. If you have, for example, a single parental SES,
it's easy to create a record containing that SES for each year of the
subject's life, and then to drop those records for the year of, and
all years after, the first employment record. SES based on subject's
education may be relevant only after the education is completed;
you'll know best about that.

But you'll end up with a 'long' file that has an SES, and the source
of that SES (keep these), computed by methods standard in your field,
for each year from the subject's birth to the last year for which
they have an employment record.

After that, decide what you want to do. If you're going to compare
with health, I'd add the health by year to this long file, before
converting to wide form. (Depending on analysis, you may or may not
*ever* convert to wide form. You'll know that better than I, except
I'm always recommending keeping data in 'long' form -- my taste, and
advice that is often, but not invariably, right.)

You wrote,

>It would make more sense to index [and create by-decade summary
>variables] by age (e.g. SES_16_25, SES_26-35).
>
>>You'd have to say what SES value to use for the decade: The
>>highest? The mean of all values that occur in the decade?
>
>The mean of all values that occur in a decade, but I am open to suggestions.

This also could be done in the long-form file (code not tested):

STRING DECADE (A5).
RECODE AGE
(00 THRU 15 = '00_15')
(16 THRU 25 = '16_25')
...
(66 THRU HI = '66_up')
INTO DECADE.

DATASET DECLARE BY_Decade.
AGGREGATE OUTFILE=BY_Decade
BREAK=DECADE
SES =MEAN(SES).

The result would be suitable for putting into 'wide' form with
CASESTOVARS, using an /INDEX=DECADE subcommand.

This leaves plenty of questions, but I think it'll get you a step farther.

-Best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jack Noone

Re: Syntax help - duplicating variables

Thanks so much Richard, you're a credit to the List.

This gives me plenty to go on.

Jack

Dr. Jack Noone
Research Fellow & LHH/ABBA Project Manager
Ageing, Work and Health Research Unit
Faculty of Health Sciences
University of Sydney

Ph: 02 9351 9411

On 7/12/12 7:23 AM, "Richard Ristow" <[hidden email]> wrote:

>At 03:15 AM 12/1/2012, Jack Noone wrote:
>
>>We are examining the relationship between socioeconomic status and
>>health across the life course. The aim is to create a socioeconomic
>>trajectory starting from early childhood to late-middle age for each
>>participant and then compare it with their health trajectory over
>>the same period.
>
>Thank you! This is enough detail to go a good ways toward addressing
>your problem.
>
>I'd written,
>
>>>Indexing values by calendar year gives very different comparisons
>>>than does indexing by employment year (or, by the way, by year of
>>>age). Which indexing to use depends critically on what questions
>>>you want to answer.
>
>and you make clear that you eventually want the values indexed by
>subject's calendar age.
>
>The code I posted previously gives you the occupation-based SES for
>each subject for each calendar year covered by one of their
>employment records. Given how you receive your data, it's best and
>easiest to start with this step. That leaves, however,
>
>1. Converting calendar year to subject's age
>2. Filling in gaps for years not covered by any employment record for
>the subject
>and (new problem)
>3. Filling in parent-based or education-based SES for years before
>the commencement of employment.
>
>The first is easy if you have all subjects' birth years: use MATCH
>FILES to add the year of birth to each record, then subtract birth
>year from calendar year to get subject's age. (Keep the calendar year
>in the record, though, for checking and possible analysis.)
>
>You've requested and received answers about how to do the second with
>the employment history converted to 'wide' format using CASESTOVARS.
>Alternatively (and it's probably what I'd do myself) you can use
>XSAVE logic to create a long-form file with the gaps filled in from
>the SES of the last previous year with employment. (You'd actually do
>this before you do step 1.)
>
>Finally, filling in SES from other sources depends on what other
>sources you have. If you have, for example, a single parental SES,
>it's easy to create a record containing that SES for each year of the
>subject's life, and then to drop those records for the year of, and
>all years after, the first employment record. SES based on subject's
>education may be relevant only after the education is completed;
>you'll know best about that.
>
>But you'll end up with a 'long' file that has an SES, and the source
>of that SES (keep these), computed by methods standard in your field,
>for each year from the subject's birth to the last year for which
>they have an employment record.
>
>After that, decide what you want to do. If you're going to compare
>with health, I'd add the health by year to this long file, before
>converting to wide form. (Depending on analysis, you may or may not
>*ever* convert to wide form. You'll know that better than I, except
>I'm always recommending keeping data in 'long' form -- my taste, and
>advice that is often, but not invariably, right.)
>
>You wrote,
>
>>It would make more sense to index [and create by-decade summary
>>variables] by age (e.g. SES_16_25, SES_26-35).
>>
>>>You'd have to say what SES value to use for the decade: The
>>>highest? The mean of all values that occur in the decade?
>>
>>The mean of all values that occur in a decade, but I am open to
>>suggestions.
>
>This also could be done in the long-form file (code not tested):
>
>STRING DECADE (A5).
>RECODE AGE
> (00 THRU 15 = '00_15')
> (16 THRU 25 = '16_25')
> ...
> (66 THRU HI = '66_up')
> INTO DECADE.
>
>DATASET DECLARE BY_Decade.
>AGGREGATE OUTFILE=BY_Decade
> BREAK=DECADE
> SES =MEAN(SES).
>
>The result would be suitable for putting into 'wide' form with
>CASESTOVARS, using an /INDEX=DECADE subcommand.
>
>This leaves plenty of questions, but I think it'll get you a step farther.
>
>-Best wishes,
> Richard
>

David Marso

Re: Syntax help - duplicating variables

Administrator

In reply to this post by Richard Ristow

Right on!
Basic concept? Normalization!
---
"I'm always recommending keeping data in 'long' form -- my taste, and
advice that is often, but not invariably, right.) "

Richard Ristow wrote

At 03:15 AM 12/1/2012, Jack Noone wrote:

>We are examining the relationship between socioeconomic status and
>health across the life course. The aim is to create a socioeconomic
>trajectory starting from early childhood to late-middle age for each
>participant and then compare it with their health trajectory over
>the same period.

Thank you! This is enough detail to go a good ways toward addressing
your problem.

I'd written,

>>Indexing values by calendar year gives very different comparisons
>>than does indexing by employment year (or, by the way, by year of
>>age). Which indexing to use depends critically on what questions
>>you want to answer.

and you make clear that you eventually want the values indexed by
subject's calendar age.

The code I posted previously gives you the occupation-based SES for
each subject for each calendar year covered by one of their
employment records. Given how you receive your data, it's best and
easiest to start with this step. That leaves, however,

1. Converting calendar year to subject's age
2. Filling in gaps for years not covered by any employment record for
the subject
and (new problem)
3. Filling in parent-based or education-based SES for years before
the commencement of employment.

The first is easy if you have all subjects' birth years: use MATCH
FILES to add the year of birth to each record, then subtract birth
year from calendar year to get subject's age. (Keep the calendar year
in the record, though, for checking and possible analysis.)

You've requested and received answers about how to do the second with
the employment history converted to 'wide' format using CASESTOVARS.
Alternatively (and it's probably what I'd do myself) you can use
XSAVE logic to create a long-form file with the gaps filled in from
the SES of the last previous year with employment. (You'd actually do
this before you do step 1.)

Finally, filling in SES from other sources depends on what other
sources you have. If you have, for example, a single parental SES,
it's easy to create a record containing that SES for each year of the
subject's life, and then to drop those records for the year of, and
all years after, the first employment record. SES based on subject's
education may be relevant only after the education is completed;
you'll know best about that.

But you'll end up with a 'long' file that has an SES, and the source
of that SES (keep these), computed by methods standard in your field,
for each year from the subject's birth to the last year for which
they have an employment record.

After that, decide what you want to do. If you're going to compare
with health, I'd add the health by year to this long file, before
converting to wide form. (Depending on analysis, you may or may not
*ever* convert to wide form. You'll know that better than I, except
I'm always recommending keeping data in 'long' form -- my taste, and
advice that is often, but not invariably, right.)

You wrote,

>It would make more sense to index [and create by-decade summary
>variables] by age (e.g. SES_16_25, SES_26-35).
>
>>You'd have to say what SES value to use for the decade: The
>>highest? The mean of all values that occur in the decade?
>
>The mean of all values that occur in a decade, but I am open to suggestions.

This also could be done in the long-form file (code not tested):

STRING DECADE (A5).
RECODE AGE
(00 THRU 15 = '00_15')
(16 THRU 25 = '16_25')
...
(66 THRU HI = '66_up')
INTO DECADE.

DATASET DECLARE BY_Decade.
AGGREGATE OUTFILE=BY_Decade
BREAK=DECADE
SES =MEAN(SES).

The result would be suitable for putting into 'wide' form with
CASESTOVARS, using an /INDEX=DECADE subcommand.

This leaves plenty of questions, but I think it'll get you a step farther.

-Best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD