Weighting

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Weighting

Heiko Klemm
Dear List,

 

I'm looking for freeware to calculate weights.  The software should be able to calculated multidimensional weights, i.e. calculating weights that are suitable for more than one distribution (rim weighting with many variables) and should have an algorithm to optimize for the smallest possible weights. Any suggestions?

 

Regards

 

Heiko Klemm
MRSC Division Business Services
 
Deutsche Post World Net
Business Consulting GmbH
Market Research Service Center
Heussallee/Tulpenfeld 2
53113 Bonn
 
Phone    +49 228 2435-740

Fax         +49 228 2435-799
eMail      [hidden email]

 

Geschäftsführung: Dr. Clemens Beckmann, Holger Winklbauer
Geschäftsleitung MRSC: Michael Seitz, Sabine Menzel
Sitz Bonn, Tulpenfeld 1, 53113 Bonn
Registergericht Bonn, HRB 9000

 
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Peck, Jon
Have you looked at the rake module on SPSS Developer Central (www.spss.com/devcentral)?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Heiko Klemm
Sent: Monday, March 12, 2007 8:30 AM
To: [hidden email]
Subject: [SPSSX-L] Weighting

Dear List,



I'm looking for freeware to calculate weights.  The software should be able to calculated multidimensional weights, i.e. calculating weights that are suitable for more than one distribution (rim weighting with many variables) and should have an algorithm to optimize for the smallest possible weights. Any suggestions?



Regards



Heiko Klemm
MRSC Division Business Services

Deutsche Post World Net
Business Consulting GmbH
Market Research Service Center
Heussallee/Tulpenfeld 2
53113 Bonn

Phone    +49 228 2435-740

Fax         +49 228 2435-799
eMail      [hidden email]



Geschäftsführung: Dr. Clemens Beckmann, Holger Winklbauer
Geschäftsleitung MRSC: Michael Seitz, Sabine Menzel
Sitz Bonn, Tulpenfeld 1, 53113 Bonn
Registergericht Bonn, HRB 9000
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Sebastián Daza
dear jon,
you have some example of how to use rake module. greetings and thank you.
--
Sebastián Daza Aranzaes

Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Eugenio Grant
In reply to this post by Peck, Jon
Heiko:

Te rake module requires "SPSS 15, programmability, and the Advanced
Statistics module".

I have neither one of those so the next question should be "Is there a
freeware alternative to the SPSS 15 Rake Module"?

Regards,

______________________________


-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Peck,
Jon
Enviado el: Lunes, 12 de Marzo de 2007 10:58 a.m.
Para: [hidden email]
Asunto: Re: Weighting

Have you looked at the rake module on SPSS Developer Central
(www.spss.com/devcentral)?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Heiko Klemm
Sent: Monday, March 12, 2007 8:30 AM
To: [hidden email]
Subject: [SPSSX-L] Weighting

Dear List,



I'm looking for freeware to calculate weights.  The software should be able
to calculated multidimensional weights, i.e. calculating weights that are
suitable for more than one distribution (rim weighting with many variables)
and should have an algorithm to optimize for the smallest possible weights.
Any suggestions?



Regards



Heiko Klemm
MRSC Division Business Services

Deutsche Post World Net
Business Consulting GmbH
Market Research Service Center
Heussallee/Tulpenfeld 2
53113 Bonn

Phone    +49 228 2435-740

Fax         +49 228 2435-799
eMail      [hidden email]



Geschäftsführung: Dr. Clemens Beckmann, Holger Winklbauer
Geschäftsleitung MRSC: Michael Seitz, Sabine Menzel
Sitz Bonn, Tulpenfeld 1, 53113 Bonn
Registergericht Bonn, HRB 9000
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

jpduarte
Eugenio, you can use the macro at  raynald's spss tools
 http://www.spsstools.net/Syntax/Compute/WeightDataBasedOn2orMoreVars.txt
adapting it  to your specific needs

João Duarte
________________________
Hamilton Portugal
[hidden email]



----- Original Message -----
From: "Eugenio Grant" <[hidden email]>
To: <[hidden email]>
Sent: Monday, March 12, 2007 7:05 PM
Subject: Re: Weighting


> Heiko:
>
> Te rake module requires "SPSS 15, programmability, and the Advanced
> Statistics module".
>
> I have neither one of those so the next question should be "Is there a
> freeware alternative to the SPSS 15 Rake Module"?
>
> Regards,
>
> ______________________________
>
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
> Peck,
> Jon
> Enviado el: Lunes, 12 de Marzo de 2007 10:58 a.m.
> Para: [hidden email]
> Asunto: Re: Weighting
>
> Have you looked at the rake module on SPSS Developer Central
> (www.spss.com/devcentral)?
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Heiko Klemm
> Sent: Monday, March 12, 2007 8:30 AM
> To: [hidden email]
> Subject: [SPSSX-L] Weighting
>
> Dear List,
>
>
>
> I'm looking for freeware to calculate weights.  The software should be
> able
> to calculated multidimensional weights, i.e. calculating weights that are
> suitable for more than one distribution (rim weighting with many
> variables)
> and should have an algorithm to optimize for the smallest possible
> weights.
> Any suggestions?
>
>
>
> Regards
>
>
>
> Heiko Klemm
> MRSC Division Business Services
>
> Deutsche Post World Net
> Business Consulting GmbH
> Market Research Service Center
> Heussallee/Tulpenfeld 2
> 53113 Bonn
>
> Phone    +49 228 2435-740
>
> Fax         +49 228 2435-799
> eMail      [hidden email]
>
>
>
> Geschäftsführung: Dr. Clemens Beckmann, Holger Winklbauer
> Geschäftsleitung MRSC: Michael Seitz, Sabine Menzel
> Sitz Bonn, Tulpenfeld 1, 53113 Bonn
> Registergericht Bonn, HRB 9000
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.446 / Virus Database: 268.18.9/719 - Release Date: 12-03-2007
> 8:41
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Peck, Jon
In reply to this post by Sebastián Daza
Here is a simple example using the employee data.sav file after autorecoding the string variable gender into a numeric variable gendern.

 

begin program.

import spss, spssaux, rake

 

rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50, 2:.50}], finalweight='wt', visible=True, delta=1, poptotal=474)

end program.

 

This says to reweight so that the three job categories each have 1/3 of the cases, and each gender is ½.  It specifies a total pop size for the reweighted data of 474 (the same as the actual size), and it names the newly calculated weight variable wt.

 

The next example uses a dataset you can find via Google.

 

begin program.

import spss, spssaux, rake

 

spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav", dataset="john")

spss.Submit("weight by obs.")

rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight='finalwt')

end program.

 

It adjusts age and sex totals according to the number of cases: category 0 of age to 1140 cases, category 1 of age to 1140 cases; and similarly for the sex totals.

 

HTH,

Jon Peck

 

 

________________________________

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sebastián Daza
Sent: Monday, March 12, 2007 12:28 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Weighting

 

dear jon,
you have some example of how to use rake module. greetings and thank you.

--



Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weighting

jpduarte
Jon, here is an example using employee data.sav with an ordinary macro.

DEFINE !pondx (!POSITIONAL !ENCLOSE('[',']'))
WEIGHT OFF.

!DO !it = 1 !TO 20.

!LET !nnn=1

!DO !var !IN (!1)
!LET !nnn2=!nnn
!IF (!nnn=1) !then !LET !varp=!var !IFEND

!IF (!nnn=2) !THEN
!LET !valp=!var
SORT CASES BY !varp.
COMPUTE pondt = pond.
IF !varp=LAG(!varp) pondt = pondt + LAG(pondt).
SORT CASES BY !varp (A) pondt (D).
IF !varp=LAG(!varp) pondt=LAG(pondt).
COMPUTE pond=pond * (!valp / pondt).
!IFEND

!IF (!nnn2=1) !THEN !LET !nnn=2 !IFEND
!IF (!nnn2=2) !THEN !LET !nnn=1 !IFEND

!DOEND

!DOEND

!ENDDEFINE.

****************  end macro ***************.

********* begin example with employee data ***************.

autorecode gender/ into gendern.

COMPUTE pond=1.

IF gendern=1 univ1=474/2.
IF gendern=2 univ1=474/2.


IF jobcat=1 univ2=474/3.
IF jobcat=2 univ2=474/3.
IF jobcat=3 univ2=474/3.

!pondx [gendern univ1 jobcat univ2].


WEIGHT OFF.
FREQ VAR= gendern jobcat.

WEIGHT BY pond.
FREQ VAR= gendern jobcat.

********* end  example with employee data ***************.

João Duarte

----- Original Message -----
From: "Peck, Jon" <[hidden email]>
To: <[hidden email]>
Sent: Monday, March 12, 2007 8:21 PM
Subject: Re: Weighting


> Here is a simple example using the employee data.sav file after
> autorecoding the string variable gender into a numeric variable gendern.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50, 2:.50}],
> finalweight='wt', visible=True, delta=1, poptotal=474)
>
> end program.
>
>
>
> This says to reweight so that the three job categories each have 1/3 of
> the cases, and each gender is ½.  It specifies a total pop size for the
> reweighted data of 474 (the same as the actual size), and it names the
> newly calculated weight variable wt.
>
>
>
> The next example uses a dataset you can find via Google.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav",
> dataset="john")
>
> spss.Submit("weight by obs.")
>
> rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
> finalweight='finalwt')
>
> end program.
>
>
>
> It adjusts age and sex totals according to the number of cases: category 0
> of age to 1140 cases, category 1 of age to 1140 cases; and similarly for
> the sex totals.
>
>
>
> HTH,
>
> Jon Peck
>
>
>
>
>
> ________________________________
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Sebastián Daza
> Sent: Monday, March 12, 2007 12:28 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Weighting
>
>
>
> dear jon,
> you have some example of how to use rake module. greetings and thank you.
>
> --
>
>
>
> Sebastián Daza Aranzaes
> Instituto de Sociología UC
> 8-471 53 87 / 686 57 20 / Fax 5521834
> [hidden email]
>
>


--------------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.10/720 - Release Date: 12-03-2007
19:19
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Sebastián Daza
In reply to this post by Peck, Jon
rake module works excellent, thanks.

--
Sebastián Daza Aranzaes

Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Peck, Jon
In reply to this post by jpduarte
This is a nice, straightforward implementation of raking: adjust the weights in one dimension to match the control totals; then adjust in the other dimension and keep repeating this.  It is far from obvious that approaching this problem by raking gives the same answer as fitting a main effects log-linear model, but it does.  In this particular example, the computed weights differ slightly but give the same control totals.  Raking can have convergence problems.  I can't say how this approach would compare with the log-linear model in that respect.  It would be interesting to experiment.

A drawback of this approach is the large number of sorts required: 2 X number of control variables x number of iterations.  That's 80 sorts in this example.

The only reason for sorting here, however, is to get the right cell sums of the weights.  You could eliminate that by using AGGREGATE to compute the sums within category, matching the result back to the original dataset, which would be much faster.

You could also compute all the cell sums with a little Python code, but AGGREGATE should be faster.

Thanks for the example.

-Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of João Duarte
Sent: Tuesday, March 13, 2007 7:01 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Weighting

Jon, here is an example using employee data.sav with an ordinary macro.

DEFINE !pondx (!POSITIONAL !ENCLOSE('[',']'))
WEIGHT OFF.

!DO !it = 1 !TO 20.

!LET !nnn=1

!DO !var !IN (!1)
!LET !nnn2=!nnn
!IF (!nnn=1) !then !LET !varp=!var !IFEND

!IF (!nnn=2) !THEN
!LET !valp=!var
SORT CASES BY !varp.
COMPUTE pondt = pond.
IF !varp=LAG(!varp) pondt = pondt + LAG(pondt).
SORT CASES BY !varp (A) pondt (D).
IF !varp=LAG(!varp) pondt=LAG(pondt).
COMPUTE pond=pond * (!valp / pondt).
!IFEND

!IF (!nnn2=1) !THEN !LET !nnn=2 !IFEND
!IF (!nnn2=2) !THEN !LET !nnn=1 !IFEND

!DOEND

!DOEND

!ENDDEFINE.

****************  end macro ***************.

********* begin example with employee data ***************.

autorecode gender/ into gendern.

COMPUTE pond=1.

IF gendern=1 univ1=474/2.
IF gendern=2 univ1=474/2.


IF jobcat=1 univ2=474/3.
IF jobcat=2 univ2=474/3.
IF jobcat=3 univ2=474/3.

!pondx [gendern univ1 jobcat univ2].


WEIGHT OFF.
FREQ VAR= gendern jobcat.

WEIGHT BY pond.
FREQ VAR= gendern jobcat.

********* end  example with employee data ***************.

João Duarte

----- Original Message -----
From: "Peck, Jon" <[hidden email]>
To: <[hidden email]>
Sent: Monday, March 12, 2007 8:21 PM
Subject: Re: Weighting


> Here is a simple example using the employee data.sav file after
> autorecoding the string variable gender into a numeric variable gendern.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50, 2:.50}],
> finalweight='wt', visible=True, delta=1, poptotal=474)
>
> end program.
>
>
>
> This says to reweight so that the three job categories each have 1/3 of
> the cases, and each gender is ½.  It specifies a total pop size for the
> reweighted data of 474 (the same as the actual size), and it names the
> newly calculated weight variable wt.
>
>
>
> The next example uses a dataset you can find via Google.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav",
> dataset="john")
>
> spss.Submit("weight by obs.")
>
> rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
> finalweight='finalwt')
>
> end program.
>
>
>
> It adjusts age and sex totals according to the number of cases: category 0
> of age to 1140 cases, category 1 of age to 1140 cases; and similarly for
> the sex totals.
>
>
>
> HTH,
>
> Jon Peck
>
>
>
>
>
> ________________________________
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Sebastián Daza
> Sent: Monday, March 12, 2007 12:28 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Weighting
>
>
>
> dear jon,
> you have some example of how to use rake module. greetings and thank you.
>
> --
>
>
>
> Sebastián Daza Aranzaes
> Instituto de Sociología UC
> 8-471 53 87 / 686 57 20 / Fax 5521834
> [hidden email]
>
>


--------------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.10/720 - Release Date: 12-03-2007
19:19
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Hector Maletta
        Jon Peck wrote (see below): "It is far from obvious that approaching
this problem [rim weighting] by raking gives the same answer as fitting a
main effects log-linear model, but it does."

        Now there are many people without access to the RAKE procedure
(which is only included in v.15 but not on previous versions of SPSS). I
wonder whether fitting a main effects log-linear model (available in
previous versions) can do the same job, even if it is indeed not obvious
either for me why it should be so. Perhaps Jon may explain this matter a bit
further. I suggest the following two questions:
        1. Why the outcome of RAKE is equivalent to a log linear model,

        And most importantly,

        2. How a log linear model may be applied to generate rim weights
with two or more weighting variables.

        Hector

        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Peck,
Jon
Enviado el: 13 March 2007 13:47
Para: [hidden email]
Asunto: Re: Weighting

        This is a nice, straightforward implementation of raking: adjust the
weights in one dimension to match the control totals; then adjust in the
other dimension and keep repeating this.  It is far from obvious that
approaching this problem by raking gives the same answer as fitting a main
effects log-linear model, but it does.  In this particular example, the
computed weights differ slightly but give the same control totals.  Raking
can have convergence problems.  I can't say how this approach would compare
with the log-linear model in that respect.  It would be interesting to
experiment.

        A drawback of this approach is the large number of sorts required: 2
X number of control variables x number of iterations.  That's 80 sorts in
this example.

        The only reason for sorting here, however, is to get the right cell
sums of the weights.  You could eliminate that by using AGGREGATE to compute
the sums within category, matching the result back to the original dataset,
which would be much faster.

        You could also compute all the cell sums with a little Python code,
but AGGREGATE should be faster.

        Thanks for the example.

        -Jon Peck

        -----Original Message-----
        From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of João Duarte
        Sent: Tuesday, March 13, 2007 7:01 AM
        To: [hidden email]
        Subject: Re: [SPSSX-L] Weighting

        Jon, here is an example using employee data.sav with an ordinary
macro.

        DEFINE !pondx (!POSITIONAL !ENCLOSE('[',']'))
        WEIGHT OFF.

        !DO !it = 1 !TO 20.

        !LET !nnn=1

        !DO !var !IN (!1)
        !LET !nnn2=!nnn
        !IF (!nnn=1) !then !LET !varp=!var !IFEND

        !IF (!nnn=2) !THEN
        !LET !valp=!var
        SORT CASES BY !varp.
        COMPUTE pondt = pond.
        IF !varp=LAG(!varp) pondt = pondt + LAG(pondt).
        SORT CASES BY !varp (A) pondt (D).
        IF !varp=LAG(!varp) pondt=LAG(pondt).
        COMPUTE pond=pond * (!valp / pondt).
        !IFEND

        !IF (!nnn2=1) !THEN !LET !nnn=2 !IFEND
        !IF (!nnn2=2) !THEN !LET !nnn=1 !IFEND

        !DOEND

        !DOEND

        !ENDDEFINE.

        ****************  end macro ***************.

        ********* begin example with employee data ***************.

        autorecode gender/ into gendern.

        COMPUTE pond=1.

        IF gendern=1 univ1=474/2.
        IF gendern=2 univ1=474/2.


        IF jobcat=1 univ2=474/3.
        IF jobcat=2 univ2=474/3.
        IF jobcat=3 univ2=474/3.

        !pondx [gendern univ1 jobcat univ2].


        WEIGHT OFF.
        FREQ VAR= gendern jobcat.

        WEIGHT BY pond.
        FREQ VAR= gendern jobcat.

        ********* end  example with employee data ***************.

        João Duarte

        ----- Original Message -----
        From: "Peck, Jon" <[hidden email]>
        To: <[hidden email]>
        Sent: Monday, March 12, 2007 8:21 PM
        Subject: Re: Weighting


        > Here is a simple example using the employee data.sav file after
        > autorecoding the string variable gender into a numeric variable
gendern.
        >
        >
        >
        > begin program.
        >
        > import spss, spssaux, rake
        >
        >
        >
        > rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50,
2:.50}],
        > finalweight='wt', visible=True, delta=1, poptotal=474)
        >
        > end program.
        >
        >
        >
        > This says to reweight so that the three job categories each have
1/3 of
        > the cases, and each gender is ½.  It specifies a total pop size
for the
        > reweighted data of 474 (the same as the actual size), and it names
the
        > newly calculated weight variable wt.
        >
        >
        >
        > The next example uses a dataset you can find via Google.
        >
        >
        >
        > begin program.
        >
        > import spss, spssaux, rake
        >
        >
        >
        >
spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav",
        > dataset="john")
        >
        > spss.Submit("weight by obs.")
        >
        > rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6,
1:2175.4}],
        > finalweight='finalwt')
        >
        > end program.
        >
        >
        >
        > It adjusts age and sex totals according to the number of cases:
category 0
        > of age to 1140 cases, category 1 of age to 1140 cases; and
similarly for
        > the sex totals.
        >
        >
        >
        > HTH,
        >
        > Jon Peck
        >
        >
        >
        >
        >
        > ________________________________
        >
        > From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of
        > Sebastián Daza
        > Sent: Monday, March 12, 2007 12:28 PM
        > To: [hidden email]
        > Subject: Re: [SPSSX-L] Weighting
        >
        >
        >
        > dear jon,
        > you have some example of how to use rake module. greetings and
thank you.
        >
        > --
        >
        >
        >
        > Sebastián Daza Aranzaes
        > Instituto de Sociología UC
        > 8-471 53 87 / 686 57 20 / Fax 5521834
        > [hidden email]
        >
        >



----------------------------------------------------------------------------
----


        No virus found in this incoming message.
        Checked by AVG Free Edition.
        Version: 7.5.446 / Virus Database: 268.18.10/720 - Release Date:
12-03-2007
        19:19
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Peck, Jon
For a general idea of why raking is equivalent to a main effects log-linear model, you can look at

http://www.fcsm.gov/01papers/Greene.pdf, Section 2.  (ignore that part about SAS macros!)

But basically, you are modeling the counts, say, for two dimensions, as
N(i,j) = a(i)b(j)n(i,j), where N refers to the population, and n refers to the sample.

This is linear in the logs
log(N(i.j)) = log(a(i) )+ log(b(j)) + error(i,j)

so fitting that model is the basis of the connection.

It is some distance from there to producing the actual weights in Genlog.  Reading the Python code in rake.py will explain the steps, which involve a fair amount of SPSS syntax (but not too much knowledge of Python).  One serious complication is dealing with empty cells.  Obviously a totally empty row or column will cause problems, but individual empty cells also require treatment.  A very small value is added to such cells in order to get convergence.  Without this, empty cells can cause nonconvergence.  The example that was posted with employee data.sav has an empty cell (no female custodians), but the straightforward rake still converges in this case.  You can't count on that in general.

Now, if you want to do what the rake.py module does but without SPSS 15, you can work out the steps directly in syntax.  Here is an extract from SPSS Resolution 31840 written by David Nichols.  He came up with this algorithm, which was inspired by reading Agresti's categorical data analysis book.  A lot of the complications have to do with the empty cell problem.

1) Identify the variables to use to define the weighting scheme, and the desired marginal counts for each one. The desired marginal totals for each one should add to the same number, which is the desired total value for the entire table (this may be the original weighted or unweighted sample size, or it may be a population value or some other standard value).

2) Assuming the data are not aggregated to begin with, produce an aggregated file, breaking on these variables and saving the Ns for the different combinations of the variables (the cells in the table produced by aggregating on them, or what you would see if you crosstabulated them).

3) If there are any missing combinations, add these to the aggregated file, with counts of 1e-8 (or some similar very small number).

4) Compute a desired marginal count variable for each of these weighting variables. The values of this marginal count value should be the target count value for each level of each variable.

5) Compute the expected values for each cell of the table under an independence model. These are computed by taking the product of all the marginal variables and dividing it by the desired table total value (which is the sum of any of the marginal variables) to the power of one less than the number of marginal variables (i.e., for two variables, it's just the total, for three, it's the total squared, etc.).

6) Weight the data by the newly created expected values variable.

7) Run the GENLOG procedure using the variables that define the weighting structure as factors in a main effects only or independence model, using the observed count variable as a cell structure variable, and saving the expected or predicted counts from this model back to the data. Make sure the estimation converges. Tightening down the convergence criterion and bumping up the iterations can, within reason, only make things more precise.

8) The saved predicted counts are the desired weighted values of the cells. If you began with aggregated table data, you simply weight the file by these counts and you're done (the rim weights have been applied).

9) Assuming that you began with individual data, compute the rim weights by dividing the saved predicted values by the observed counts, after first deleting any cases you added for empty cells.

10) Compute a matching id variable using the weighting variables that will have a unique value for each combination of these variables. Some sort of concatenation of values will suffice (for example, if they're all numeric and integers beginning with 1 and with fewer than 10 values for each one, you can compute this as the value of one of them plus ten times the value of another one plus 100 times the value of a third, etc.).

11) Sort the file on this variable. Save the file.

12) Get the original file.

13) Repeat the creation of the matching id variable and sort the data by it.

14) Match the files, using the previously saved file as a table, matching on the match id variable and keeping the original data and the rim_weight variable.

When all is said and done, you have rim weighted data. That is, the data have been weighted so that the desired marginal counts are produced. Note that this is a form of poststratification weighting, and the resulting weighted data can only be used for descriptive purposes in SPSS. Even with the SPSS Complex Samples module, SPSS currently cannot produce valid inferential statistics for post-stratification-weighted data.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Tuesday, March 13, 2007 12:16 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Weighting

        Jon Peck wrote (see below): "It is far from obvious that approaching
this problem [rim weighting] by raking gives the same answer as fitting a
main effects log-linear model, but it does."

        Now there are many people without access to the RAKE procedure
(which is only included in v.15 but not on previous versions of SPSS). I
wonder whether fitting a main effects log-linear model (available in
previous versions) can do the same job, even if it is indeed not obvious
either for me why it should be so. Perhaps Jon may explain this matter a bit
further. I suggest the following two questions:
        1. Why the outcome of RAKE is equivalent to a log linear model,

        And most importantly,

        2. How a log linear model may be applied to generate rim weights
with two or more weighting variables.

        Hector

        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Peck,
Jon
Enviado el: 13 March 2007 13:47
Para: [hidden email]
Asunto: Re: Weighting

        This is a nice, straightforward implementation of raking: adjust the
weights in one dimension to match the control totals; then adjust in the
other dimension and keep repeating this.  It is far from obvious that
approaching this problem by raking gives the same answer as fitting a main
effects log-linear model, but it does.  In this particular example, the
computed weights differ slightly but give the same control totals.  Raking
can have convergence problems.  I can't say how this approach would compare
with the log-linear model in that respect.  It would be interesting to
experiment.

        A drawback of this approach is the large number of sorts required: 2
X number of control variables x number of iterations.  That's 80 sorts in
this example.

        The only reason for sorting here, however, is to get the right cell
sums of the weights.  You could eliminate that by using AGGREGATE to compute
the sums within category, matching the result back to the original dataset,
which would be much faster.

        You could also compute all the cell sums with a little Python code,
but AGGREGATE should be faster.

        Thanks for the example.

        -Jon Peck

        -----Original Message-----
        From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of João Duarte
        Sent: Tuesday, March 13, 2007 7:01 AM
        To: [hidden email]
        Subject: Re: [SPSSX-L] Weighting

        Jon, here is an example using employee data.sav with an ordinary
macro.

        DEFINE !pondx (!POSITIONAL !ENCLOSE('[',']'))
        WEIGHT OFF.

        !DO !it = 1 !TO 20.

        !LET !nnn=1

        !DO !var !IN (!1)
        !LET !nnn2=!nnn
        !IF (!nnn=1) !then !LET !varp=!var !IFEND

        !IF (!nnn=2) !THEN
        !LET !valp=!var
        SORT CASES BY !varp.
        COMPUTE pondt = pond.
        IF !varp=LAG(!varp) pondt = pondt + LAG(pondt).
        SORT CASES BY !varp (A) pondt (D).
        IF !varp=LAG(!varp) pondt=LAG(pondt).
        COMPUTE pond=pond * (!valp / pondt).
        !IFEND

        !IF (!nnn2=1) !THEN !LET !nnn=2 !IFEND
        !IF (!nnn2=2) !THEN !LET !nnn=1 !IFEND

        !DOEND

        !DOEND

        !ENDDEFINE.

        ****************  end macro ***************.

        ********* begin example with employee data ***************.

        autorecode gender/ into gendern.

        COMPUTE pond=1.

        IF gendern=1 univ1=474/2.
        IF gendern=2 univ1=474/2.


        IF jobcat=1 univ2=474/3.
        IF jobcat=2 univ2=474/3.
        IF jobcat=3 univ2=474/3.

        !pondx [gendern univ1 jobcat univ2].


        WEIGHT OFF.
        FREQ VAR= gendern jobcat.

        WEIGHT BY pond.
        FREQ VAR= gendern jobcat.

        ********* end  example with employee data ***************.

        João Duarte

        ----- Original Message -----
        From: "Peck, Jon" <[hidden email]>
        To: <[hidden email]>
        Sent: Monday, March 12, 2007 8:21 PM
        Subject: Re: Weighting


        > Here is a simple example using the employee data.sav file after
        > autorecoding the string variable gender into a numeric variable
gendern.
        >
        >
        >
        > begin program.
        >
        > import spss, spssaux, rake
        >
        >
        >
        > rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50,
2:.50}],
        > finalweight='wt', visible=True, delta=1, poptotal=474)
        >
        > end program.
        >
        >
        >
        > This says to reweight so that the three job categories each have
1/3 of
        > the cases, and each gender is ½.  It specifies a total pop size
for the
        > reweighted data of 474 (the same as the actual size), and it names
the
        > newly calculated weight variable wt.
        >
        >
        >
        > The next example uses a dataset you can find via Google.
        >
        >
        >
        > begin program.
        >
        > import spss, spssaux, rake
        >
        >
        >
        >
spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav",
        > dataset="john")
        >
        > spss.Submit("weight by obs.")
        >
        > rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6,
1:2175.4}],
        > finalweight='finalwt')
        >
        > end program.
        >
        >
        >
        > It adjusts age and sex totals according to the number of cases:
category 0
        > of age to 1140 cases, category 1 of age to 1140 cases; and
similarly for
        > the sex totals.
        >
        >
        >
        > HTH,
        >
        > Jon Peck
        >
        >
        >
        >
        >
        > ________________________________
        >
        > From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of
        > Sebastián Daza
        > Sent: Monday, March 12, 2007 12:28 PM
        > To: [hidden email]
        > Subject: Re: [SPSSX-L] Weighting
        >
        >
        >
        > dear jon,
        > you have some example of how to use rake module. greetings and
thank you.
        >
        > --
        >
        >
        >
        > Sebastián Daza Aranzaes
        > Instituto de Sociología UC
        > 8-471 53 87 / 686 57 20 / Fax 5521834
        > [hidden email]
        >
        >



----------------------------------------------------------------------------
----


        No virus found in this incoming message.
        Checked by AVG Free Edition.
        Version: 7.5.446 / Virus Database: 268.18.10/720 - Release Date:
12-03-2007
        19:19
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Populus
In reply to this post by Peck, Jon
Jon

I am using SPSS 15.0 and would like to RIM weight online responses from access panels to make them nationally representative by age, gender, region and social economic grade (class)

Do I need to download Python or buy any advanced SPSS modules to run an adapted version of the program you describe below? Or can I just run it as a syntax on the data file I'm seeking to weight?

Many thanks

Yours

Rick Nye

Peck, Jon wrote
Here is a simple example using the employee data.sav file after autorecoding the string variable gender into a numeric variable gendern.

 

begin program.

import spss, spssaux, rake

 

rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50, 2:.50}], finalweight='wt', visible=True, delta=1, poptotal=474)

end program.

 

This says to reweight so that the three job categories each have 1/3 of the cases, and each gender is ½.  It specifies a total pop size for the reweighted data of 474 (the same as the actual size), and it names the newly calculated weight variable wt.

 

The next example uses a dataset you can find via Google.

 

begin program.

import spss, spssaux, rake

 

spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav", dataset="john")

spss.Submit("weight by obs.")

rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight='finalwt')

end program.

 

It adjusts age and sex totals according to the number of cases: category 0 of age to 1140 cases, category 1 of age to 1140 cases; and similarly for the sex totals.

 

HTH,

Jon Peck

 

 

________________________________

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Sebastián Daza
Sent: Monday, March 12, 2007 12:28 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: [SPSSX-L] Weighting

 

dear jon,
you have some example of how to use rake module. greetings and thank you.

--



Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
sdaza@uc.cl
Reply | Threaded
Open this post in threaded view
|

Re: Weighting

Peck, Jon
You need several things assuming that you have SPSS 15 and the Advanced Models option already.  The extras are all free.  Install the downloadable 15.0.1 patch to SPSS first, if you have not already done that.

Python 2.4 from www.python.org  (Be careful not to get 2.5)
The rest are from SPSS Developer Central (www.spss.com/devcentral)
Python plug-in for SPSS 15.0.1
supplemental modules:
spssaux.py, spssdata.py, namedtuple.py, rake.py

You can just save the supplemental modules to the python24\lib\site-packages directory assuming you do the default Python install.

The rake module can handle as many dimensions as the Genlog procedure supports, which is 5 if memory serves.  If you have too many cells and dimensions and your sample is way off, you may get some extreme weights, so you might want to look at a histogram of the weights to see how much the reweighting is changing your sample.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Populus
Sent: Friday, March 23, 2007 1:15 AM
To: [hidden email]
Subject: Re: [SPSSX-L] Weighting

Jon

I am using SPSS 15.0 and would like to RIM weight online responses from
access panels to make them nationally representative by age, gender, region
and social economic grade (class)

Do I need to download Python or buy any advanced SPSS modules to run an
adapted version of the program you describe below? Or can I just run it as a
syntax on the data file I'm seeking to weight?

Many thanks

Yours

Rick Nye


Peck, Jon wrote:

>
> Here is a simple example using the employee data.sav file after
> autorecoding the string variable gender into a numeric variable gendern.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> rake.rake(['jobcat','gendern'],[{1:.333, 2:.333,3:.334}, {1:.50, 2:.50}],
> finalweight='wt', visible=True, delta=1, poptotal=474)
>
> end program.
>
>
>
> This says to reweight so that the three job categories each have 1/3 of
> the cases, and each gender is ½.  It specifies a total pop size for the
> reweighted data of 474 (the same as the actual size), and it names the
> newly calculated weight variable wt.
>
>
>
> The next example uses a dataset you can find via Google.
>
>
>
> begin program.
>
> import spss, spssaux, rake
>
>
>
> spssaux.OpenDataFile("c:/spss15proj/program/Greene_Raking_Fire_Data.sav",
> dataset="john")
>
> spss.Submit("weight by obs.")
>
> rake.rake(['age', 'sex'], [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
> finalweight='finalwt')
>
> end program.
>
>
>
> It adjusts age and sex totals according to the number of cases: category 0
> of age to 1140 cases, category 1 of age to 1140 cases; and similarly for
> the sex totals.
>
>
>
> HTH,
>
> Jon Peck
>
>
>
>
>
> ________________________________
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Sebastián Daza
> Sent: Monday, March 12, 2007 12:28 PM
> To: [hidden email]
> Subject: Re: [SPSSX-L] Weighting
>
>
>
> dear jon,
> you have some example of how to use rake module. greetings and thank you.
>
> --
>
>
>
> Sebastián Daza Aranzaes
> Instituto de Sociología UC
> 8-471 53 87 / 686 57 20 / Fax 5521834
> [hidden email]
>
>
>

--
View this message in context: http://www.nabble.com/Weighting-tf3389488.html#a9629090
Sent from the SPSSX Discussion mailing list archive at Nabble.com.