Login  Register

Re: Stratum containing entire population plus stratum containing a sample

Posted by Rich Ulrich on May 23, 2014; 6:30pm
URL: http://spssx-discussion.165.s1.nabble.com/Stratum-containing-entire-population-plus-stratum-containing-a-sample-tp5726206p5726208.html

It always makes me nervous when someone tries to talk about the
"total population" because (a) 99.9% of all research has to treat
even a "population" as if it were a sample; (b) 99% of all questions
that I have seen (sent to Lists) were wrong-headed where they tried
to use "population statistics"; and (c) almost every audience expects
to see the testing done in the ordinary way.

Population statistics, the ones with zero total variance, are most often
applicable to incoming data on election eve.  Beyond that, they arise in
certain roles of administering limited resources, where the measurements
can be assumed to have been made with (almost) no error....  Very rarely,
"population statistics" do apply to research which is undertaken to draw
inferences.  Are you sure you have justification for it?  (And, if so, your
presentation *must* give a clear emphasis to the unusual choice.)

In this case, the sample description that follows is not one that I can parse.
It seems that the "total population" is 3200, and the "data set" is 1600.
However, the "first stratum" is "an entire population" which further is
described as "80% of the data and 36% of the entire population".

Despite having two evocations of "entire population", that *almost* parses:
If the 80% of 1600 is a complete sampling of one characteristic, then it
would be 40% of the 3200, rather than 36%.   Is this merely a round-off
error in your presentation?

 - If you are truly comparing to "complete data" in the first stratum, which,
further, is measured without error, then you might want a strategy that
compares the other means to the fixed values that are the means for the
first stratum.  I don't know what the options are for doing that.

--
Rich Ulrich



Date: Fri, 23 May 2014 16:16:55 +0000
From: [hidden email]
Subject: Stratum containing entire population plus stratum containing a sample
To: [hidden email]

Hello,

 

I’m working with a data set with 1600 observations from a total population of about 3200. The variables are mostly categorical, and mostly dichotomous. The data are made up of two strata, one of which is an entire population and the other a random sample. The first stratum makes up about 80% of the data and 36% of the entire population. The random sample stratum contains four components which potentially could have been strata as well, but weren’t sampled that way. The sampling plan was determined externally. Some findings will have to compare the five components (the stratum with the entire population and the four components within the other stratum). Others will be for the entire sample.

 

I first thought that the way handle the data was to weight, and I used the formula given by Maletta (2007, available on line), which divides the proportion of the entire population made up by each stratum by the proportion of the sample made up by that stratum. This reduces the influence of the first stratum while expanding the influence of the second. When I examine the differences among the five components using the SPSS Complex Samples procedure, the results come with acceptable margins of error, less than 5%.

 

Is this the best way to look at the data, given that I have a complete population for one stratum and relatively small samples from the four components that make up the other one?  That is, should I honor the completeness of the data in the first stratum?

 

The write-up will be a report, rather than an article, and the audience is professional, but not necessarily research minded.

 

Many thanks,

 

Henry

 

Henry Ilian, Ph.D.

ACS Office of Quality Improvement

150 William Street, 17th Fl

New York, NY 10038 

(212) 227-5414