|
The training set and validation set will be created from a database
consisting of 100 variables and 4000 cases. One variable called "group" has 5 categories: A, B, C, D and E. The training set should be 70% random sample from the entire database. However, the cases for the training set should be selected proportionately according to the proportion of A, B, C, D and E in the entire data set. Assuming that the proportions of A,B,C D and E are .5, .10, 20, .30, and .35, respectively, can this be done through syntax? Thank you. Johnny Get your preferred Email name! Now you can @ymail.com and @rocketmail.com. http://mail.promotions.yahoo.com/newdomains/ph/ ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi John,
Try this: compute random = uniform(100). SORT CASES BY group(A) random(A). compute sample = 1. if any(MOD($casenum,10), 2, 5, 8) sample = 0. exe. val lab sample 0 "validation" 1 "training". CROSSTABS /TABLES=group BY sample /FORMAT=AVALUE TABLES /CELLS=COUNT ROW /COUNT ROUND CELL. Best regards, Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Johnny Amora Sent: Tuesday, October 21, 2008 9:10 AM To: [hidden email] Subject: Dividing the dataset into training and validation sets The training set and validation set will be created from a database consisting of 100 variables and 4000 cases. One variable called "group" has 5 categories: A, B, C, D and E. The training set should be 70% random sample from the entire database. However, the cases for the training set should be selected proportionately according to the proportion of A, B, C, D and E in the entire data set. Assuming that the proportions of A,B,C D and E are .5, .10, 20, .30, and .35, respectively, can this be done through syntax? Thank you. Johnny Get your preferred Email name! Now you can @ymail.com and @rocketmail.com. http://mail.promotions.yahoo.com/newdomains/ph/ ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____________ Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem. Jste si jisti, že opravdu potřebujete vytisknout tuto zprávu a/nebo její přílohy? Myslete na přírodu. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. Are you sure that you really need a print version of this message and/or its attachments? Think about nature. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
