|
Dear SPSS users. I have a problem with one of the databases that Im
processing. My DB has information about Households and their expenditures in a set of items (11 items that include food, transport, health, etc.). Each of this items has a unique code, for ex. Food is 1, health is 5, etc. The thing is that the people who prepared the database didnt put a value when the HH didnt report expenditures in one or some of the issues. What I want to do is to locate the HH which do not have the code 1 (for food, since this is the first in the list for each HH) in order to have the full list of HH no matter if they didnt reported Food expenditures. Ex. HH Item 1 1 1 2 1 5 1 4 2 1 2 2 2 5 2 4 3 2 3 3 3 4 3 5 I would like to introduce a new case with the info for HH 3 on the item 1, no matter if the item has just missing values. (I can share an extract of the DB for people interested in helping me). Thanks. __________________________________________________________________ Rodrigo Briceño Project Manager Sanigest Internacional +506 291 1200 ext. 113 Oficina Costa Rica +506 232 0830 Fax +506 886 1177 Celular <mailto:[hidden email]> [hidden email] <http://www.sanigest.com/> www.sanigest.com MSN: <mailto:[hidden email]> [hidden email] SKYPE: rbriceno1087 _____________________ This communication contains legal information which is privileged and confidential. It is for the exclusive use of the address and distribution, dissemination, copying or use by others is strictly prohibited. If you have received this communication by error, please delete the original message and e-mail us. Esta comunicación contiene información legal privilegiada y confidencial para el uso exclusivo del destinatario. La distribución, diseminación, copia u otro uso por terceras personas es estrictamente prohibida. Si usted ha recibido esta comunicación por error, le rogamos borrar el mensaje original y comunicárnoslo a esta misma dirección. |
|
Rodrigo,
I'm not sure that this is quite what you need but it does address the specific issue you asked about in your example, i.e., "I would like to introduce a new case with the info for HH 3 on the item 1, no matter if the item has just missing values." I'm going to assume that your file has a 'long' structure as you show in your example. So, you want to locate cases that don't have a specific type of record. To find those cases you need to do a subtraction of cases with that record type from all cases (i.e., set subtraction). Here's the strategy that I'd use to start with. Do a temporary select cases with an Item=1 record and save to a new file but keep only HH and item. Aggregate HH and save to a new file. You have to create a new variable when you aggregate but this variable is not needed and can be discarded This is the total. Match these two files together. Cases with a missing value for item are cases that are missing a type=1 record. Select those cases and change the value of item to 1. This is the result of the subtraction operation. Add this set of cases back into your original file. In code. * let your file0 be your current database file. It is assumed to be open. Sort cases by hh item. Do if (Item eq 1). + xsave outfile='your file1'/keep hh item. End if. Execute. Aggregate outfile='your file2'/break=hh/junk=out(item). Execute. Match files file='your file1'/file='your file2'/by hh/drop=junk. Execute. Select if (sysmis(item)). Compute item=1. Execute. Add files file=*/file='your file0'/by hh item. Execute. Gene Maguin |
|
In reply to this post by Rodrigo Briceño
At 05:22 PM 9/6/2007, Rodrigo Briceno wrote:
>My DB has information about Households and their expenditures in a set >of >items (11 items that include food, transport, health, etc.). Each of >this >items has a unique code, for ex. Food is 1, health is 5, etc. The >database [has no record] didn't put a value when the HH didn't report >expenditures in one or some of the issues. To paraphrase what you say later: you want a record for every code for every household, by filling in the ones omitted from your database, with missing data. Here's a way: use XSAVE to produce a complete set of records for every household, with no data; merge back with the original file. For demonstration, it has 5 item codes instead of 11. SPSS 15 draft output: |-----------------------------|---------------------------| |Output Created |07-SEP-2007 23:56:38 | |-----------------------------|---------------------------| HH Item VALUE 1 1 192 1 2 64 1 5 165 1 4 118 2 1 232 2 2 175 2 5 294 2 4 110 3 2 320 3 3 39 3 4 176 3 5 127 Number of cases read: 12 Number of cases listed: 12 SORT CASES BY HH Item. * Mark, if it's the first record in a household ............... . DO IF $CASENUM EQ 1. . COMPUTE #New_HH = 1. ELSE IF HH NE LAG(HH). . COMPUTE #New_HH = 1. ELSE. . COMPUTE #New_HH = 0. END IF. * For each household, when it's new, write a complete suite of . * 5 'item' records, with missing values, to a scratch file . DO IF #New_HH. . COMPUTE #Hold_It = Item. . NUMERIC Out_Item (F2). . LOOP Out_Item = 1 TO 5. . XSAVE OUTFILE=Fill_In /RENAME=(Out_Item Item = Item Discard) /KEEP = HH Item. . END LOOP. END IF. EXECUTE /* -- Force a data pass, so the XSAVE loop is executed -- */. MATCH FILES /FILE=* /FILE=Fill_In /BY HH Item /DROP=Out_Item. LIST. List |-----------------------------|---------------------------| |Output Created |07-SEP-2007 23:56:39 | |-----------------------------|---------------------------| HH Item VALUE 1 1 192 1 2 64 1 3 . 1 4 118 1 5 165 2 1 232 2 2 175 2 3 . 2 4 110 2 5 294 3 1 . 3 2 320 3 3 39 3 4 176 3 5 127 Number of cases read: 15 Number of cases listed: 15 =================================== APPENDIX: File handle and test data =================================== * ................................................................. . * ................. File for XSAVE output ..................... . FILE HANDLE Fill_In /NAME='C:\Documents and Settings\Richard\My Documents' + '\Temporary\SPSS\' + '2007-09-06 Briceno - how to do this efficiently' + ' - ' + 'Fill_In.SAV'. * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 1634 /* Eastern Main Line, PA, telephone book */ . DATA LIST LIST SKIP=1/ HH Item. BEGIN DATA HH Item 1 1 1 2 1 5 1 4 2 1 2 2 2 5 2 4 3 2 3 3 3 4 3 5 END DATA. FORMATS HH Item (F2). NUMERIC VALUE (F4). COMPUTE VALUE = TRUNC(RV.UNIFORM(1,351)). * ................. Post after this point ..................... . * ................................................................. . LIST. |
|
In reply to this post by Rodrigo Briceño
Hi Rodrigo,
Here's another approach. This involves a double-transpose of your data. 1. It requires that you create a subset of your data which contains ALL records but with only the following 3 variables: HH, Item, and a variable that has two or more valid values and best if it has no system missing values (let's call this variable XX). I said "best" because it will be easy to spot which records were added later on if XX has no system missing values. 2. You then restructure this file to wide form (CASESTOVARS) to create 11 item variables . The variables are indexed by the values of your Item variable and prefixed by your XX variable. 3. Then you restructure this wide form back to long form (VARSTOCASES). The result will display a rectangular data structure -- each hh will have 11 records one for each item. You will know which records in the rectangular file were NOT in your subset data as they will have missing values displayed in variable XX. 4. You then match this rectangular file to your original data set. Best regards, Florio Here's a sample code. *Step 1 - here i am creating a sample data, but in your case you may just have to submit a SAVE OUTFILE command with a DROP option to create your subset because you already have an existing SPSS data set. DATA LIST FREE / HH Item xx. BEGIN DATA. 1001008011 1 0 1001008011 2 0 1001008011 3 0 1001008011 4 0 1001008011 5 1 1001008011 6 1 1001008011 8 1 1001008011 10 1 1001008011 11 1 1001008021 1 0 1001008021 2 0 1001008021 3 0 1001008021 4 0 1001008021 7 1 1001008021 8 1 1001008021 9 0 1001008021 10 0 1001008021 11 0 END DATA. XSAVE OUTFILE 'test'. *Step 2. SORT CASES BY HH Item . CASESTOVARS /ID = HH /INDEX = Item /GROUPBY = VARIABLE . *Step 3 - in this step i commented out the delete var xx command so you can see what happens when you restructure back to long form. *Variable XX is not necessary after this step, so you can delete it if you want. VARSTOCASES /MAKE xx FROM xx.1.00 xx.2.00 xx.3.00 xx.4.00 xx.5.00 xx.6.00 xx.7.00 xx.8.00 xx.9.00 xx.10.00 xx.11.00 /INDEX = Item(11) /KEEP = HH /NULL = KEEP. *DELETE VAR xx. SAVE OUTFILE 'testa'. *step 4. MATCH FILE FILE = 'testa' /FILE = '<your original data set>' /BY HH Item. SAVE OUTFILE 'testab'. At 05:22 PM 9/6/2007, Rodrigo Briceno wrote: >Dear SPSS users. I have a problem with one of the databases that Im >processing. > > > >My DB has information about Households and their expenditures in a set of >items (11 items that include food, transport, health, etc.). Each of this >items has a unique code, for ex. Food is 1, health is 5, etc. The thing is >that the people who prepared the database didnt put a value when the HH >didnt report expenditures in one or some of the issues. > >What I want to do is to locate the HH which do not have the code 1 (for >food, since this is the first in the list for each HH) in order to have the >full list of HH no matter if they didnt reported Food expenditures. > > > >Ex. > > > >HH Item > >1 1 > >1 2 > >1 5 > >1 4 > >2 1 > >2 2 > >2 5 > >2 4 > >3 2 > >3 3 > >3 4 > >3 5 > > > >I would like to introduce a new case with the info for HH 3 on the item 1, >no matter if the item has just missing values. (I can share an extract of >the DB for people interested in helping me). Thanks. > >__________________________________________________________________ > >Rodrigo Briceño >Project Manager >Sanigest Internacional > >+506 291 1200 ext. 113 Oficina Costa Rica >+506 232 0830 Fax >+506 886 1177 Celular > <mailto:[hidden email]> [hidden email] > <http://www.sanigest.com/> www.sanigest.com > >MSN: <mailto:[hidden email]> [hidden email] >SKYPE: rbriceno1087 > >_____________________ > >This communication contains legal information which is privileged and >confidential. It is for the exclusive use of the address and distribution, >dissemination, copying or use by others is strictly prohibited. If you have >received this communication by error, please delete the original message and >e-mail us. > > >Esta comunicación contiene información legal privilegiada y confidencial >para el uso exclusivo del destinatario. La distribución, diseminación, copia >u otro uso por terceras personas es estrictamente prohibida. Si usted ha >recibido esta comunicación por error, le rogamos borrar el mensaje original >y comunicárnoslo a esta misma dirección. > > |
|
At 02:44 PM 9/8/2007, Florio Arguillas wrote:
>Here's another approach. This involves a double-transpose of your >data. Florio suggests a VARSTOCASES, instead of LOOP/XSAVE, to get full suites of blank records for every household. Here's an implementation: >2. You then restructure this file to wide form (CASESTOVARS) to >create 11 item variables . The variables are indexed by the values of >your Item variable and prefixed by your XX variable. That's often a good idea. CASESTOVARS will give you a complete set of item identifiers FOUND IN THE FILE; in a file of any size, that's commonly a complete set of identifiers overall. I don't do it, though. Instead, I rely on knowing the complete set of allowed item identifiers, and generate those directly (by LOOP/COMPUTE) in the wide-format file. >3. Then you restructure this wide form back to long form >(VARSTOCASES). The result will display a rectangular data structure >-- each hh will have 11 records one for each item. I do do this, as described. >4. You then match this rectangular file to your original data set. And I do this, as well. This implementation is tested - SPSS draft output: [Original] HH Item VALUE 1 1 192 1 2 64 1 5 165 1 4 118 2 1 232 2 2 175 2 5 294 2 4 110 3 2 320 3 3 39 3 4 176 3 5 127 Number of cases read: 12 Number of cases listed: 12 SORT CASES BY HH Item. * In a separate dataset, AGGREGATE to one record per household ... . DATASET DECLARE Fill_In. AGGREGATE OUTFILE=Fill_In /BREAK=HH /NRecrds 'Number of records for household' = NU. DATASET ACTIVATE Fill_In WINDOW=FRONT. * Add a complete suite of valid identifiers to every houseld record . VECTOR Item. (5,F2). LOOP #Item = 1 TO 5. . COMPUTE Item.(#Item) = #Item. END LOOP. * VARSTOCASES, to give every houshold a *record* for each item ID . VARSTOCASES /MAKE Item FROM Item.1 TO Item.5 /DROP=NRecrds Variables to Cases |----------------------------|---------------------------| |Output Created |11-SEP-2007 01:15:12 | |----------------------------|---------------------------| [Fill_In] Generated Variables |----|------| |Name|Label | |----|------| |Item|<none>| |----|------| Processing Statistics |-------------|-| |Variables In |7| |Variables Out|2| |-------------|-| * Join with the original file to give every household a complete . * set of item numbers. . MATCH FILES /FILE=Original /FILE=Fill_In /BY HH Item. LIST. List |-----------------------------|---------------------------| |Output Created |11-SEP-2007 01:15:13 | |-----------------------------|---------------------------| HH Item VALUE 1 1 192 1 2 64 1 3 . 1 4 118 1 5 165 2 1 232 2 2 175 2 3 . 2 4 110 2 5 294 3 1 . 3 2 320 3 3 39 3 4 176 3 5 127 Number of cases read: 15 Number of cases listed: 15 =================== APPENDIX: Test data =================== * ................................................................. . * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 1634 /* Eastern Main Line, PA, telephone book */ . DATA LIST LIST SKIP=1/ HH Item. BEGIN DATA HH Item 1 1 1 2 1 5 1 4 2 1 2 2 2 5 2 4 3 2 3 3 3 4 3 5 END DATA. FORMATS HH Item (F2). NUMERIC VALUE (F4). COMPUTE VALUE = TRUNC(RV.UNIFORM(1,351)). DATASET NAME Original WINDOW=FRONT. * ................. Post after this point ..................... . * ................................................................. . LIST. |
| Free forum by Nabble | Edit this page |
