Hi all, I need to find a subsample of 80 cases from a pool of about 8500 cases. Each case has 11 variables (x1 to x11) whose sum=100%. The subsample I'm looking for is the one that will
give me total proportions of x1 to x11 that will be as close as possible to a preset distribution. The basic plan is to check a sample, compute proportions and magnitude of difference from the criterion distribution, print a report of sample members and statistics,
then go on to the next sample, etc. Of course, I can't check all possible combinations (almost infinite…). I know this is not a "classic" SPSS issue, but this is the tool I feel most comfortable with. Is there any way to play with BOOTSTRAP and/or SAMPLE to optimize this problem? Any
other ideas? Thanks!
|
This isn’t written very clearly at all. X1 through x11 each have a range of values and frequency distribution of those values. I’m guessing that you want
the frequency distributions of x1-x11 in the desired sample of 80 to match a specified set of frequency distributions. Most simply, suppose x1-x11 are all dichotomous (0,1) variables and in the desired sample the frequency distribution of each variable, x1-x11,
is 50%-50%. Is this an accurate statement of what you want? I don’t know what they are but I’ll bet there are algorithms for this kind of problem.
Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Quant Sapio Hi all, I need to find a subsample of 80 cases from a pool of about 8500 cases. Each case has 11 variables (x1 to x11) whose sum=100%. The subsample I'm looking for is the one that will
give me total proportions of x1 to x11 that will be as close as possible to a preset distribution. The basic plan is to check a sample, compute proportions and magnitude of difference from the criterion distribution, print a report of sample members and statistics,
then go on to the next sample, etc. Of course, I can't check all possible combinations (almost infinite…). I know this is not a "classic" SPSS issue, but this is the tool I feel most comfortable with. Is there any way to play with BOOTSTRAP and/or SAMPLE to optimize this problem? Any
other ideas? Thanks!
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
Administrator
|
In reply to this post by Quant Sapio
If I follow you, how about selecting the 80 cases that have the lowest values of a Chi-square-like statistic? Something like this:
COMPUTE Xsq = 0. DO REPEAT O = X1 to X11 / E = { list of preset percentages that sum to 100 }. - COMPUTE Xsq = SUM(Xsq, (O-E)**2/E). END REPEAT. EXECUTE. RANK VARIABLES=Xsq. COMPUTE FLAG = RXsq LE 80. FORMATS FLAG (F1). HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi all,
I'm sorry - I didn't explain myself well enough... I have 8500 PSUs. For each one I have information regarding the distribution of garbage components: PSU organic hdpe glass metal etc. total 1 10% 20% 15% 15% ... 100% 2 20% 15% 30% 5% ... 100% ..... (assume all PSUs are identical in size - that can easily be solved by weighting). I need to create a sample of 80 PSUs in order to track changes following a regulation change. I don't want to use traditional sampling methods, since all cluster segmentation methods seems to miss "hidden" or unmeasurable characteristics (for example, social cohesiveness of the community). What I'm trying to do is to choose a random sample of 80 PSUs that, when aggregated, have the same distribution as the total country distribution of garbage components. Basically, what I have to do is create numerous combinations of 80 PSUs, and for each one check the aggregated distribution against the total country figures. My problem is creating the samples and aggregating their results in a reasonably simple code, along the lines of (pseudo code): For i=1 to [number of samples I want to check] Create random sample of n=80 Aggregate proportions of those 80 Add aggregated data to existing file (previous/consecutive samples), with variables that identify members of the sample Next i I hope I explained this riddle well... my English is not as good as it used to be! -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: Tuesday, December 23, 2014 12:32 AM To: [hidden email] Subject: Re: Combinatorical optimization? If I follow you, how about selecting the 80 cases that have the lowest values of a Chi-square-like statistic? Something like this: COMPUTE Xsq = 0. DO REPEAT O = X1 to X11 / E = { list of preset percentages that sum to 100 }. - COMPUTE Xsq = SUM(Xsq, (O-E)**2/E). END REPEAT. EXECUTE. RANK VARIABLES=Xsq. COMPUTE FLAG = RXsq LE 80. FORMATS FLAG (F1). HTH. Quant Sapio wrote > Hi all, > > I need to find a subsample of 80 cases from a pool of about 8500 cases. > Each case has 11 variables (x1 to x11) whose sum=100%. The subsample > I'm looking for is the one that will give me total proportions of x1 > to x11 that will be as close as possible to a preset distribution. > The basic plan is to check a sample, compute proportions and magnitude > of difference from the criterion distribution, print a report of > sample members and statistics, then go on to the next sample, etc. > Of course, I can't check all possible combinations (almost infinite...). > I know this is not a "classic" SPSS issue, but this is the tool I feel > most comfortable with. Is there any way to play with BOOTSTRAP and/or > SAMPLE to optimize this problem? Any other ideas? > > Thanks! > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Combinatorical-optimization-tp 5728262p5728264.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
How are you planning to analyze this carefully
selected sample? It won't be a random sample or any of the conventional
complex samples that I can think of. And even if you are able to
constrain all the marginals to meet your requirements, nothing guarantees
that the joint distribution will be optimal.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Nirit Avnimelech <[hidden email]> To: [hidden email] Date: 12/22/2014 05:04 PM Subject: Re: [SPSSX-L] Combinatorical optimization? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi all, I'm sorry - I didn't explain myself well enough... I have 8500 PSUs. For each one I have information regarding the distribution of garbage components: PSU organic hdpe glass metal etc. total 1 10% 20% 15% 15% ... 100% 2 20% 15% 30% 5% ... 100% ..... (assume all PSUs are identical in size - that can easily be solved by weighting). I need to create a sample of 80 PSUs in order to track changes following a regulation change. I don't want to use traditional sampling methods, since all cluster segmentation methods seems to miss "hidden" or unmeasurable characteristics (for example, social cohesiveness of the community). What I'm trying to do is to choose a random sample of 80 PSUs that, when aggregated, have the same distribution as the total country distribution of garbage components. Basically, what I have to do is create numerous combinations of 80 PSUs, and for each one check the aggregated distribution against the total country figures. My problem is creating the samples and aggregating their results in a reasonably simple code, along the lines of (pseudo code): For i=1 to [number of samples I want to check] Create random sample of n=80 Aggregate proportions of those 80 Add aggregated data to existing file (previous/consecutive samples), with variables that identify members of the sample Next i I hope I explained this riddle well... my English is not as good as it used to be! -----Original Message----- From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Bruce Weaver Sent: Tuesday, December 23, 2014 12:32 AM To: [hidden email] Subject: Re: Combinatorical optimization? If I follow you, how about selecting the 80 cases that have the lowest values of a Chi-square-like statistic? Something like this: COMPUTE Xsq = 0. DO REPEAT O = X1 to X11 / E = { list of preset percentages that sum to 100 }. - COMPUTE Xsq = SUM(Xsq, (O-E)**2/E). END REPEAT. EXECUTE. RANK VARIABLES=Xsq. COMPUTE FLAG = RXsq LE 80. FORMATS FLAG (F1). HTH. Quant Sapio wrote > Hi all, > > I need to find a subsample of 80 cases from a pool of about 8500 cases. > Each case has 11 variables (x1 to x11) whose sum=100%. The subsample > I'm looking for is the one that will give me total proportions of x1 > to x11 that will be as close as possible to a preset distribution. > The basic plan is to check a sample, compute proportions and magnitude > of difference from the criterion distribution, print a report of > sample members and statistics, then go on to the next sample, etc. > Of course, I can't check all possible combinations (almost infinite...). > I know this is not a "classic" SPSS issue, but this is the tool I feel > most comfortable with. Is there any way to play with BOOTSTRAP and/or > SAMPLE to optimize this problem? Any other ideas? > > Thanks! > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Combinatorical-optimization-tp 5728262p5728264.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Very true, but this is precisely what I want to check (will this method be better – in this research context – than a conventional method?). From: Jon K Peck [mailto:[hidden email]] How are you planning to analyze this carefully selected sample? It won't be a random sample or any of the conventional complex samples that I can think of. And even if you are able to constrain all the marginals to meet your requirements, nothing guarantees that the joint distribution will be optimal.
|
Nirit, You have a multistage problem. First is laying out the target values for the criterion distribution. Second is pulling many random samples of 80 PSU's from your universe of 8500 PSUs. Third use RAKING to adjust each sample to the criterion (or find necessary weights to do so). Fourth, you can then rank the samples by the computed weight. (Jon Peck can address the issue of RAKING in SPSS) ... Mark Miller On Mon, Dec 22, 2014 at 5:28 PM, Nirit Avnimelech <[hidden email]> wrote:
|
Precisely what I'm trying to do. But how do I automate the "pulling many random samples" and "aggregating each sample's results"? From: Mark Miller [mailto:[hidden email]] Nirit, You have a multistage problem. First is laying out the target values for the criterion distribution. Second is pulling many random samples of 80 PSU's from your universe of 8500 PSUs. Third use RAKING to adjust each sample to the criterion (or find necessary weights to do so). Fourth, you can then rank the samples by the computed weight. (Jon Peck can address the issue of RAKING in SPSS) ... Mark Miller On Mon, Dec 22, 2014 at 5:28 PM, Nirit Avnimelech <[hidden email]> wrote: Very true, but this is precisely what I want to check (will this method be better – in this research context – than a conventional method?). From: Jon K Peck [mailto:[hidden email]] How are you planning to analyze this carefully selected sample? It won't be a random sample or any of the conventional complex samples that I can think of. And even if you are able to constrain all the marginals to meet your requirements, nothing guarantees that the joint distribution will be optimal.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Pretty trivial!
/* simulate some test data */. PRESERVE. SET MXLOOPS=100000. MATRIX. COMPUTE M=TRUNC(UNIFORM(10000,15)*50). COMPUTE x=RSUM(M). LOOP #=1 TO NROW(M). COMPUTE M(#,:)=M(#,:)/x(#)*100. END LOOP. SAVE M / OUTFILE * / VARIABLES c01 TO c15. END MATRIX. DATASET NAME raw. RESTORE. /* overall averages */. DATASET DECLARE agg. DATASET DECLARE summary. AGGREGATE OUTFILE agg/ BREAK / a01 TO a15=MEAN(c01 TO c15). /* Create random samples */. DATASET ACTIVATE raw. COMPUTE @id@=$CASENUM. COMPUTE @scramble@=RV.UNIFORM(0,1). SORT CASES BY @scramble@. COMPUTE @strata@=TRUNC($casenum/80-.01). /* Sample summaries */. AGGREGATE OUTFILE summary/BREAK @strata@/a01 TO a15=MEAN(c01 TO c15).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thank you thank you thank you!
-----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Tuesday, December 23, 2014 4:58 PM To: [hidden email] Subject: Re: Combinatorial optimization? Pretty trivial! /* simulate some test data */. PRESERVE. SET MXLOOPS=100000. MATRIX. COMPUTE M=TRUNC(UNIFORM(10000,15)*50). COMPUTE x=RSUM(M). LOOP #=1 TO NROW(M). COMPUTE M(#,:)=M(#,:)/x(#)*100. END LOOP. SAVE M / OUTFILE * / VARIABLES c01 TO c15. END MATRIX. DATASET NAME raw. RESTORE. /* overall averages */. DATASET DECLARE agg. DATASET DECLARE summary. AGGREGATE OUTFILE agg/ BREAK / a01 TO a15=MEAN(c01 TO c15). /* Create random samples */. DATASET ACTIVATE raw. COMPUTE @id@=$CASENUM. *COMPUTE @scramble@=RV.UNIFORM(0,1). SORT CASES BY @scramble@. COMPUTE @strata@=TRUNC($casenum/80-.01).* /* Sample summaries */. AGGREGATE OUTFILE summary/BREAK @strata@/a01 TO a15=MEAN(c01 TO c15). Nirit Avnimelech wrote > Precisely what I'm trying to do. But how do I automate the "pulling > many random samples" and "aggregating each sample's results"? > > > > From: Mark Miller [mailto: > mdhmiller@ > ] > Sent: Tuesday, December 23, 2014 4:37 AM > To: Nirit Avnimelech > Cc: SPSS list > Subject: Re: Combinatorial optimization? > > > > Nirit, > > > > You have a multistage problem. > > First is laying out the target values for the criterion distribution. > > Second is pulling many random samples of 80 PSU's from your universe > of > 8500 PSUs. > > Third use RAKING to adjust each sample to the criterion (or find > necessary weights to do so). > > Fourth, you can then rank the samples by the computed weight. > > > > (Jon Peck can address the issue of RAKING in SPSS) > > > > ... Mark Miller > > > > On Mon, Dec 22, 2014 at 5:28 PM, Nirit Avnimelech < > niritav@ > > wrote: > > Very true, but this is precisely what I want to check (will this > method be better – in this research context – than a conventional method?). > > > > From: Jon K Peck [mailto: > peck@.ibm > ] > Sent: Tuesday, December 23, 2014 2:47 AM > To: Nirit Avnimelech > Cc: > SPSSX-L@.UGA > Subject: Re: [SPSSX-L] Combinatorial optimization? > > > > How are you planning to analyze this carefully selected sample? It > won't be a random sample or any of the conventional complex samples > that I can think of. And even if you are able to constrain all the > marginals to meet your requirements, nothing guarantees that the joint > distribution will be optimal. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > phone: 720-342-5621 > > > > > From: Nirit Avnimelech < > niritav@ > > > To: > SPSSX-L@.UGA > > Date: 12/22/2014 05:04 PM > Subject: Re: [SPSSX-L] Combinatorical optimization? > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.UGA > > > > _____ > > > > > Hi all, > > I'm sorry - I didn't explain myself well enough... > I have 8500 PSUs. For each one I have information regarding the > distribution of garbage components: > > PSU organic hdpe > glass metal etc. > total > 1 10% 20% > 15% 15% ... > 100% > 2 20% 15% > 30% 5% ... > 100% > ..... > (assume all PSUs are identical in size - that can easily be solved by > weighting). > > I need to create a sample of 80 PSUs in order to track changes > following a regulation change. I don't want to use traditional > sampling methods, since all cluster segmentation methods seems to miss > "hidden" or unmeasurable characteristics (for example, social > cohesiveness of the community). What I'm trying to do is to choose a > random sample of 80 PSUs that, when aggregated, have the same > distribution as the total country distribution of garbage components. > > Basically, what I have to do is create numerous combinations of 80 > PSUs, and for each one check the aggregated distribution against the > total country figures. My problem is creating the samples and > aggregating their results in a reasonably simple code, along the lines > of (pseudo code): > For i=1 to [number of samples I want to check] > Create random sample of n=80 > Aggregate proportions of those 80 > Add aggregated data to existing file > (previous/consecutive samples), with variables that identify members > of the sample Next i > > I hope I explained this riddle well... my English is not as good as > it used to be! > > -----Original Message----- > From: SPSSX(r) Discussion [ <mailto: > SPSSX-L@.UGA > > mailto: > SPSSX-L@.UGA > ] On Behalf Of > Bruce Weaver > Sent: Tuesday, December 23, 2014 12:32 AM > To: > SPSSX-L@.UGA > Subject: Re: Combinatorical optimization? > > If I follow you, how about selecting the 80 cases that have the lowest > values of a Chi-square-like statistic? Something like this: > > COMPUTE Xsq = 0. > DO REPEAT O = X1 to X11 / E = { list of preset percentages that sum to > 100 }. > - COMPUTE Xsq = SUM(Xsq, (O-E)**2/E). > END REPEAT. > EXECUTE. > RANK VARIABLES=Xsq. > COMPUTE FLAG = RXsq LE 80. > FORMATS FLAG (F1). > > HTH. > > > Quant Sapio wrote >> Hi all, >> >> I need to find a subsample of 80 cases from a pool of about 8500 cases. >> Each case has 11 variables (x1 to x11) whose sum=100%. The >> subsample I'm looking for is the one that will give me total >> proportions of x1 to x11 that will be as close as possible to a preset distribution. >> The basic plan is to check a sample, compute proportions and >> magnitude of difference from the criterion distribution, print a >> report of sample members and statistics, then go on to the next sample, etc. >> Of course, I can't check all possible combinations (almost infinite...). >> I know this is not a "classic" SPSS issue, but this is the tool I >> feel most comfortable with. Is there any way to play with BOOTSTRAP >> and/or SAMPLE to optimize this problem? Any other ideas? >> >> Thanks! >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the command. To leave the >> list, send the command SIGNOFF SPSSX-L For a list of commands to >> manage subscriptions, send the command INFO REFCARD > > > > > > ----- > -- > Bruce Weaver > bweaver@ > <http://sites.google.com/a/lakeheadu.ca/bweaver/> > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: > > <http://spssx-discussion.1045642.n5.nabble.com/Combinatorical-optim > ization-tp> > http://spssx-discussion.1045642.n5.nabble.com/Combinatorical-optimizat > ion-tp > 5728262p5728264.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a > message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD > > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Combinatorical-optimization-tp5728262p5728273.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |