|
I have a 1.5 million line file made up of two columns of data. The first
column has a common identifier,always "$**ID**$", before each facility's data begins, then always two lines down is the facility's unique ID number (always 9 digits ending with an H) and the next line (third line after $**ID**$) is the facility's name. Further down the rows is a specific item number I am looking for (always the same number) and its value in the column next it. This repeats 40 times (40 facilities). The number of rows of data in each facility varies. How can I select out the facility's ID number, name, desired item number and its value? example below $**ID**$ value 5902002H Facility Name (many intervening rows of items with values) item number value needed item no value (many intervening rows of items with values, then repeats 40 times) $**ID**$ value 5902002H Facility Name (many intervening rows of items with values) item number value item number value item number value item number value needed item no value (many intervening rows of items with values) |
|
Gary,
I don't have time to work out an idea but information that others may need to work out a solution are the following. 1) Are you reading a text file, or is the data already in a spreadsheet or a database? 2) Is the data structured? That is, is the number of lines of data for each facility exactly the same or does it vary from from facility to anther? 3) Is the location of the data to be extracted for a facility in exactly the same position relative to the first line for data for all facilities? 4) Is the information to be read in exactly the same columns for every record containing that information? If the answer to 1) is text, I'd suggest that you start reading the documentation for Input program-end input program. Gene Maguin |
|
Hi
I believe the following does the job. * This assumes you want the values for item number 35. DATA LIST FIXED /str 1-20 (A) val1 30-35(F). Data List will read 1 records from the command file Variable Rec Start End Format str 1 1 20 A20 val1 1 30 35 F6.0 BEGIN DATA. $**ID**$ 1 1 2 5902002H 3 ABC home 4 25 17 35 22 45 45 $**ID**$ 1 1 2 5902002H 3 ABC home 4 22 18 23 55 24 14 25 17 26 01 35 28 45 45 $**ID**$ 1 1 2 5902088H 3 XYZ home 4 35 47 45 45 56 56 75 11 END DATA. LIST. List str val1 $**ID**$ 1 1 2 5902002H 3 ABC home 4 25 17 35 22 45 45 $**ID**$ 1 1 2 5902002H 3 ABC home 4 22 18 23 55 24 14 25 17 26 1 35 28 45 45 $**ID**$ 1 1 2 5902088H 3 XYZ home 4 35 47 45 45 56 56 75 11 Number of cases read: 26 Number of cases listed: 26 COMPUTE start1= (str="$**ID**$"). IF start1=1 OR LAG(start1,2)=1 OR LAG(start1,3) OR str="35" KeepThis=1. EXECUTE. SELECT IF KeepThis=1. DO IF $CASENUM=1. - COMPUTE grp=1. ELSE. - COMPUTE grp= LAG(grp) + start1. END IF. DO IF start1=1. - COMPUTE idx=1. ELSE. - COMPUTE idx= LAG(idx) + 1. END IF. VECTOR var(4A20). COMPUTE var(idx)=str. IF idx=4 myVal = val1. AGG OUT=* /BREAK=grp /var1 TO var4 myVal =MAX(var1 TO var4 myVal). LIST. List grp var1 var2 var3 var4 myVal 1.00 $**ID**$ 5902002H ABC home 35 22.00 2.00 $**ID**$ 5902002H ABC home 35 28.00 3.00 $**ID**$ 5902088H XYZ home 35 47.00 Number of cases read: 3 Number of cases listed: 3 -- Raynald Levesque www.spsstools.net On 8/23/07, Gene Maguin <[hidden email]> wrote: > > Gary, > > I don't have time to work out an idea but information that others may need > to work out a solution are the following. > > 1) Are you reading a text file, or is the data already in a spreadsheet or > a > database? > 2) Is the data structured? That is, is the number of lines of data for > each > facility exactly the same or does it vary from from facility to anther? > 3) Is the location of the data to be extracted for a facility in exactly > the > same position relative to the first line for data for all facilities? > 4) Is the information to be read in exactly the same columns for every > record containing that information? > > If the answer to 1) is text, I'd suggest that you start reading the > documentation for Input program-end input program. > > Gene Maguin > |
|
In reply to this post by Gary Stevens-2
Since I was about finished when Raynald posted, here's a solution in a
slightly different style, but also using AGGREGATE to go from many records to one. At 12:36 PM 8/23/2007, Gary Stevens wrote: >I have a 1.5 million line file made up of two columns of data. The >first >column has a common identifier,always "$**ID**$", before each >facility's data begins, then always two lines down is the facility's >unique ID number (always 9 digits ending with an H) and the next line >(third line after $**ID**$) is the facility's name. Further down the >rows is a specific item number I am looking for (always the same >number) and its value in the column next it. Like this, except I didn't read carefully enough, so the ID is a number, rather than a string of digits ending in 'H': |-----------------------------|---------------------------| |Output Created |23-AUG-2007 23:10:07 | |-----------------------------|---------------------------| Item VALUE $**ID**$ . (ignore) . Alpha . 20050 . 17 94 18 9 29 198 42 121 49 145 60 155 66 8 $**ID**$ . (ignore) . Beta . 30076 . 9 108 17 44 30 31 64 89 79 24 92 114 $**ID**$ . (ignore) . Gamma . 20058 . 6 128 17 48 36 116 54 85 63 148 64 135 Number of cases read: 31 Number of cases listed: 31 >How can I select out the facility's ID number, name, desired item >number and its value? Raynald selected item 35; I'm selecting item 17. This is SPSS 15 draft output: STRING Facility (A8). NUMERIC Fclty# (N5). NUMERIC Item# (F3). NUMERIC Value$ (F6). LEAVE Facility. DO IF Item EQ '$**ID**$'. . COMPUTE #LineNo = 0. . COMPUTE Facility = ' '. END IF. COMPUTE #LineNo = #LineNo + 1. IF #LineNo EQ 3 Facility = Item. IF #LineNo EQ 4 Fclty# = NUMBER(Item,F8). DO IF #LineNo GT 4. DO IF NUMBER(Item,F8) EQ 17. . COMPUTE Item# = NUMBER(Item,F8). . COMPUTE Value$ = Value. . END IF. END IF. . /*-- PRINT /*-*/ /*-- / #LineNo (F3) ': ' Item Value /*-*/ /*-- / ' -> ' Facility Fclty# Item# Value$ /*-*/. . /*-- EXECUTE /*-*/. SELECT IF Facility NE ' '. AGGREGATE OUTFILE=* /BREAK = Facility /Fclty# 'Facility ID number' = FIRST(Fclty#) /Item# 'Data item number' = FIRST(Item#) /Value 'Value of data item ' = FIRST(Value$). VARIABLE LABELS Facility 'Facility name' Fclty# 'Facility ID number'. LIST. List |-----------------------------|---------------------------| |Output Created |23-AUG-2007 23:10:08 | |-----------------------------|---------------------------| Facility Fclty# Item# Value Alpha 20050 17 94 Beta 30076 17 44 Gamma 20058 17 48 Number of cases read: 3 Number of cases listed: 3 =================== APPENDIX: Test data =================== * ................................................................. . * ................. Test data ..................... . SET RNG = MT /* 'Mersenne twister' random number generator */ . SET MTINDEX = 3782 /* Providence, RI telephone book */ . NEW FILE. INPUT PROGRAM. . STRING Item (A9). . NUMERIC VALUE (F3). . LOOP #Fclty = 1 TO 3. * ... Key to begin a new facility ................ . . COMPUTE Item = '$**ID**$'. . END CASE. * ... Ignored line, following beginning of a facility . . COMPUTE Item = '(ignore)'. . END CASE. * ... Facility name ................ . . RECODE #Fclty (1 = 'Alpha') (2 = 'Beta' ) (3 = 'Gamma') INTO Item. . END CASE. * ... Facility number ................ . . COMPUTE #Fclt# = 1E4*TRUNC(RV.UNIFORM(0,4)). . COMPUTE #Fclt# = #Fclt# + TRUNC(RV.UNIFORM(1,200)). . COMPUTE Item = STRING(#Fclt#,N5). . END CASE. * ... Data values ................ . . LOOP #Data# = 1 TO 100. . DO IF RV.BERNOULLI(.07) OR #Data# = 17. . COMPUTE Item = STRING(#Data#,F3). . COMPUTE Value = TRUNC(RV.UNIFORM(1,200)). . END CASE. . END IF. . END LOOP /* for one data value per pass */. . END LOOP. /* for one facility per pass */. END FILE. END INPUT PROGRAM. * ................. Post after this point ..................... . |
|
In reply to this post by Gary Stevens-2
I think it may be simpler than any of the suggestions so far-- this
seems like a nested file type (See blurb below from SPSS help topics "nested"), that can be read in with appropriate syntax--(they provide an example in help that you could emulate.) I believe this assumes that you have a data file that is not an SPSS file NESTED Nested file type. NESTED defines a file in which the record types are related to each other hierarchically. The record types are grouped together by a case identification number that identifies the highest level-the first record type-of the hierarchy. Usually, the last record type specified-the lowest level of the hierarchy-defines a case. For example, in a file containing household records and records for each person living in the household, each person record defines a case. Information from higher record types may be spread to each case. For example, the value for a variable on the household record, such as CITY, can be spread to the records for each person in the household. Look into this option and see if it fits what you need. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Stevens Sent: Thursday, August 23, 2007 11:37 AM To: [hidden email] Subject: [SPSSX-L] finding unique data in a column I have a 1.5 million line file made up of two columns of data. The first column has a common identifier,always "$**ID**$", before each facility's data begins, then always two lines down is the facility's unique ID number (always 9 digits ending with an H) and the next line (third line after $**ID**$) is the facility's name. Further down the rows is a specific item number I am looking for (always the same number) and its value in the column next it. This repeats 40 times (40 facilities). The number of rows of data in each facility varies. How can I select out the facility's ID number, name, desired item number and its value? example below $**ID**$ value 5902002H Facility Name (many intervening rows of items with values) item number value needed item no value (many intervening rows of items with values, then repeats 40 times) $**ID**$ value 5902002H Facility Name (many intervening rows of items with values) item number value item number value item number value item number value needed item no value (many intervening rows of items with values) PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. |
| Free forum by Nabble | Edit this page |
