Re: Reread - How can I create a case using reread?

Posted by Roberts, Michael-2 on
URL: http://spssx-discussion.165.s1.nabble.com/How-data-entry-for-TURF-analysis-tp4635246p4657437.html

Jon (and everyone else who replied with suggestions),

I have a few hundred files formatted as I described, and each one of them is different since some data elements that are included are optional while others are required.  I might add that these data are from different sources hence the difficulty with standardizing the data.

That said, these are NCPDP data files, and while I can work with one or two files (I did just that!), it would be ridiculously time consuming for me to work with each file inidividually. Anyway, I was hoping to cut down the grunge work by using some of the excellent automating syntax from Raynald's website (processing all files in folders etc.).

To answer your question - only the first four or five fields are aligned - the rest may or may not align, and from what I have read, the reread function seems to be the best bet to resolve this problem, although I would be happy for any efficient method :)

Also, the "1C" hex value (file separator) separates each field (boundary), so it is not a problem to separate the fields. The difficulty is in getting them to align in SPSS correctly.  I failed to mention that each field is preceded by a "segment identifier" such as AMx, AMxx, Fx, Fxx, etc.  However, these identifiers are only included if the data value itself is also included in the record. Therefore I felt that the reread function would be useful in building the case based on some kind of looping algorithm(?)

Finally, my apologies for not including any sample data - I will rectify that shortly, and thank you to all who have given this a little thought.

Here is what i tried first, but it failed on the very next file I tried it on!  Clunky and impossible to do for a few hundred files.


DATA LIST FILE= 'C:\DOCUMENTS AND SETTINGS\mike r\DESKTOP\HUM_20110216_20101201.txt' records=10 /

VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 VAR16 VAR17 VAR18 VAR19 VAR20 VAR21 VAR22 VAR23

VAR24,VAR25,VAR26,VAR27,VAR28,VAR29,VAR30,VAR31,VAR32,VAR33,VAR34,VAR35,VAR36,VAR37,VAR38,VAR39,VAR40,VAR41,VAR42,VAR43,VAR44

(/,1x,A11,A9,A2,A13,A10,A5,A8,A6,A4,1x,1x,A4,1x,A2,A8,1x,A3,1x,A2, A32, A4, 1x,A2,A10,1x,A2,A10,1x,1x,1x,A4,1x,A3,1x,A2,A7,1x,A4,1x,A2,A19,1x,A2,A10,1x,

A4,1x,A5,1x,A3,1x,A3,1x,A2,A8,1x,A3,2x,A4,1x,A2,A8,1x,A2,A8,2x,A4,1x,A4,1x,A12,1x).



TIA

Mike

________________________________
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck [[hidden email]]
Sent: Monday, August 01, 2011 10:09 PM
To: [hidden email]
Subject: Re: Reread - How can I create a case using reread?

REREAD is really intended for cases where there are varying record formats, and once the record type is identified, typically via a field that in the same columns for each type, you can reread that record using the appropriate DATA LIST specifications.  It doesn't sound like that is the case here.  How do you know which field is which?  Do you just split the record on hex 1C values?  Hex 1C is the ascii "file separator" character?

Just splitting the data at each x1C value would be very simple with a little Python, but it seems that you still wouldn't know which field is which.  And is each field surrounded by 1C values, or is there just one at each field boundary?  How would you declare the width of these string variables?

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Rich Ulrich <[hidden email]>
To:        [hidden email]
Date:        08/01/2011 07:50 PM
Subject:        Re: [SPSSX-L] Reread - How can I create a case using reread?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>
________________________________



Why is there a need to do a re-read?

Read what you have as String and put the parts into a new set of variables,
placing the comments in the last variable if that is what you want;
and ignore the variables that you initially read in.  Why not?

You can apply a formatted read to any substring, if you want the
numeric translations.  That seems neater than generating a
complex format  on the fly, for re-read (if that is even possible).

When you Save File, delete the original text variable.

--
Rich Ulrich

> Date: Mon, 1 Aug 2011 12:29:05 -0400
> From: [hidden email]
> Subject: Reread - How can I create a case using reread?
> To: [hidden email]
>
> Good Morning List,
>
> I am trying to build cases out of a file but am not very familiar with the reread function, so would appreciate any help with building cases out of raw data files with the following format:
>
> Data are text files with variable length records - not all fields are included in each record. Fields are delimited by the hex equivalent of '1C' and sometimes fields are followed by delimited comments, so they may appear as fields as well. There does not appear to be any pattern as to where the comments appear, except they are preceded by "FY" or "FQ" and have commas, hyphens, and periods interspersed throughout. Each record begins with a hex equivalent of '2' and ends with '3' (non printing characters); I can modify these, that is not a problem.
>
> I have tried doing this using Input Program, but lack of familiarity with "reread", my result is not applicable to the different files having different formats - very limited!
>
> Here is what I would like know: How would I go about looping through each record, building a case up, unless I find the comment, then reread past the comment, continue to build the case until I reach the end of the record, then reread in order to put the comment at the end of the case I am building?
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD