systematic order vs random order of cases - does it matter

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

systematic order vs random order of cases - does it matter

msherman

Dear List: I asked my students to create a data set of 10,000 cases of six different colors of varying proportions (labeled with 1 2 3 4 5 6). Some students created data sets that had the colors arranged in a systematic order with 1600 1’s, 1400 2’s, etc. Other students created the data set where by the colors were randomly distributed through out the data set. So that the first cases might be something like 1, 4, 2,  1, 6, 6, 3,  2, 1 (in contrast to the systematic order where the first cases would be all of the same color). Now I asked the students to obtain a random sample (via SPSS) of 100 cases from the 10,000 cases. My question is whether the two approaches are actually random (that is, is the systematic arrangement of colors okay to use).  Thoughts on this.

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

 

Loyola University Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

410-617-2417

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: systematic order vs random order of cases - does it matter

Art Kendall
When the data is systematic some emphasize systematic selection with a random start.

Am interesting exercise, concatenate the "population" files into a data set and compare the results for each set and for all the cases to the "grand pop" stats used to generate them. Likewise with the "sample" files.
In addition for the "sample" files compare their stats to the stats for the "grand pop" and to the "pop" they were sampled from.
Maybe they'll get a gut feel for the central limit theorem.
Art Kendall
Social Research Consultants
On 11/14/2013 12:05 PM, msherman [via SPSSX Discussion] wrote:

Dear List: I asked my students to create a data set of 10,000 cases of six different colors of varying proportions (labeled with 1 2 3 4 5 6). Some students created data sets that had the colors arranged in a systematic order with 1600 1’s, 1400 2’s, etc. Other students created the data set where by the colors were randomly distributed through out the data set. So that the first cases might be something like 1, 4, 2,  1, 6, 6, 3,  2, 1 (in contrast to the systematic order where the first cases would be all of the same color). Now I asked the students to obtain a random sample (via SPSS) of 100 cases from the 10,000 cases. My question is whether the two approaches are actually random (that is, is the systematic arrangement of colors okay to use).  Thoughts on this.

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

 

Loyola University Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

410-617-2417

[hidden email]

 




If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/systematic-order-vs-random-order-of-cases-does-it-matter-tp5723068.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: systematic order vs random order of cases - does it matter

Jon K Peck
In reply to this post by msherman
Well, it depends on how they do it.  In the second case, they could just use
USE 1 TO 100
since the data are already in random order.  In the first case they need to use a random number generator (or let the gui do this for them).  One thing to watch out for is how they start the random number generator.  If they start from a fixed seed, and they saved the sample dataset and they draw the subset in a new session, they would be repeating the same random numbers as they used the first time.  The default is to start the generator with a random seed.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Martin Sherman <[hidden email]>
To:        [hidden email],
Date:        11/14/2013 10:08 AM
Subject:        [SPSSX-L] systematic order vs random order of cases - does it              matter
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear List: I asked my students to create a data set of 10,000 cases of six different colors of varying proportions (labeled with 1 2 3 4 5 6). Some students created data sets that had the colors arranged in a systematic order with 1600 1’s, 1400 2’s, etc. Other students created the data set where by the colors were randomly distributed through out the data set. So that the first cases might be something like 1, 4, 2,  1, 6, 6, 3,  2, 1 (in contrast to the systematic order where the first cases would be all of the same color). Now I asked the students to obtain a random sample (via SPSS) of 100 cases from the 10,000 cases. My question is whether the two approaches are actually random (that is, is the systematic arrangement of colors okay to use).  Thoughts on this.
 
Martin F. Sherman, Ph.D.
Professor of Psychology
Director of  Masters Education in Psychology: Thesis Track
 
Loyola University Maryland
Department of Psychology
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210
 
410-617-2417
msherman@...
 
Reply | Threaded
Open this post in threaded view
|

Re: systematic order vs random order of cases - does it matter

J. R. Carroll
In reply to this post by msherman
Short answer:  They are equivalent (sorted file vs randomized file sampling).  

Remember, random sampling is meant to SAMPLE from the underlying distribution (CLT) so take into consideration the following:

Whether you force known parameters by research design on the TRUE POPULATION or if their are population parameters by organic(natural) design, then those parameters should (hopefully) be picked up in your "random sampling" (aka you say "I want 1000 yellow, 500 green, 5000 red" and THAT is your TRUE population, versus allowing the PC to perform a pseudo-random generation of colors per case with each run potentially changing your population parameters); regardless of the generation method used, sampling is meant to pickup the population traits (again, hopefully!) and carry them through to your sample.    

Knowing this, regardless of the ORDERING in your file, whether you sorted the colors or not, when you randomly sample there is an assumption that each case has an equal chance of being picked up (aka RANDOMLY SAMPLED, not SKEWED, not BIASED, no curves applied!).  Just because you randomized your population in your file doesn't mean you get a "better shuffled deck" or a "wider selection of values" if you randomly sample versus if you had sorted values first and and then randomly sampled that way -- each case has an equal change of being selected (aka random sampling doesn't typically assume -- unless you tell it to -- that case numbers in the beginning of the file are less likely to be selected from case numbers in the middle of the file -- it's ordinal-position-indifferent in its selection. If it wasn't then you would have ALL sorts of carryover effects in your sampling -- which in most cases is undesirable).

I'd say by sheer virtue of basic stats 101 for "random" sampling they are equivalent methodologies (assuming you are allowing each case to have an equal chance of being randomly sampled).  

(and as an aside:  "random" in computer science is a misnomer; you can pray for random --> you get pseudo-random... but, we know what you meant =P -- look into "seeding").  

----


J. R. Carroll
Cell:  (650) 776-6613
          [hidden email]
          [hidden email]



On Thu, Nov 14, 2013 at 12:03 PM, Martin Sherman <[hidden email]> wrote:

Dear List: I asked my students to create a data set of 10,000 cases of six different colors of varying proportions (labeled with 1 2 3 4 5 6). Some students created data sets that had the colors arranged in a systematic order with 1600 1’s, 1400 2’s, etc. Other students created the data set where by the colors were randomly distributed through out the data set. So that the first cases might be something like 1, 4, 2,  1, 6, 6, 3,  2, 1 (in contrast to the systematic order where the first cases would be all of the same color). Now I asked the students to obtain a random sample (via SPSS) of 100 cases from the 10,000 cases. My question is whether the two approaches are actually random (that is, is the systematic arrangement of colors okay to use).  Thoughts on this.

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

 

Loyola University Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

<a href="tel:410-617-2417" value="+14106172417" target="_blank">410-617-2417

[hidden email]

 


Reply | Threaded
Open this post in threaded view
|

Re: systematic order vs random order of cases - does it matter

John F Hall
In reply to this post by msherman

Martin

 

It shouldn’t make any difference, but when I did this sort of thing on a much smaller data set, I got the students to set a different seed each, preferable with a very high value.

 

SET SEED <yy,mm,dd>.                                   [date of birth, but my students’ ages ranged from 18 to 60+]

SAMPLE <n> FROM <N>.

 

David Marso supplied a routine to this list a while back which could take 100 samples and list values of means %% etc.

 

John F Hall (Mr)

[Retired academic survey researcher]

 

Email:   [hidden email] 

Website: www.surveyresearch.weebly.com

SPSS start page:  www.surveyresearch.weebly.com/spss-without-tears.html

  

  

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: 14 November 2013 18:03
To: [hidden email]
Subject: systematic order vs random order of cases - does it matter

 

Dear List: I asked my students to create a data set of 10,000 cases of six different colors of varying proportions (labeled with 1 2 3 4 5 6). Some students created data sets that had the colors arranged in a systematic order with 1600 1’s, 1400 2’s, etc. Other students created the data set where by the colors were randomly distributed through out the data set. So that the first cases might be something like 1, 4, 2,  1, 6, 6, 3,  2, 1 (in contrast to the systematic order where the first cases would be all of the same color). Now I asked the students to obtain a random sample (via SPSS) of 100 cases from the 10,000 cases. My question is whether the two approaches are actually random (that is, is the systematic arrangement of colors okay to use).  Thoughts on this.

 

Martin F. Sherman, Ph.D.

Professor of Psychology

Director of  Masters Education in Psychology: Thesis Track

 

Loyola University Maryland

Department of Psychology

222 B Beatty Hall

4501 North Charles Street

Baltimore, MD 21210

 

410-617-2417

[hidden email]