Please can anyone help with getting SPSS to read a binary file or alternatively to (re)create an ASCII file from one? I am working with SPSS files distributed by UK Data Services (University of Essex) for Gary Runciman’s 1962 survey, as reported in his book: W G Runciman Relative Deprivation and Social Justice (RKP, 1966) The original data were multipunched on 80-column cards: a single column can hold data for more than one variable. The data were later spread out on new set of single-punched cards and read into SPSS using INPUT FORMAT. Later operations involved a new version of the setup file using DATA LIST instead. During data checks I discovered that counts for birthsno (no of live children ever) and kidsdied (no of children who died) were impossible:
These figures do not make sense. Further checks revealed a discrepancy between the frequencies in the codebook and the frequencies obtained from the SPSS file for these and two other (adjacent) variables. I immediately suspected a data format reading error and reported it back to UKDS. UKDS have now privately supplied me with three files used to create the SPSS file d2028.bin a binary file with multipunched data (my computer thinks it’s movie.) r028.dat some sort of conversion from multi-punch to ASCII (Fortran?) do28.fnt holecount of multipunched data? In the package distributed by UKDS there are two *.sps files and one *.sav file: d028.sps original 1975 setup file using INPUT FORMAT d028a.sps modified setup file using DATA LIST sn28.sav SPSS saved file d028.sps: FILE NAME RUNCIMAN,RELATIVE DEPRIVATION AND SOCIAL JUSTICE VARIABLE LIST CASENO,CARDNO,NEWHOME,OLDHOME,MARITAL,BIRTHSNO,KIDSDIED,KIDSLIVE, LEAVESCH,TEENLIVE,TEENSCH,FEESCHS,MOREEDUC,EDUCTYPE ~ ~ ~ ~ LIFESTYL,ACCENT,AGE,SEX,OCCUP,EDUCFIN,INCOME,WIFECASH,SEENHOME, SEENWORK,SEENOTH # OF CASES 1415 INPUT FORMAT FIXED(F4.0,F2.0,3F1.0,F2.0,F1.0,F2.0,F1.0,F2.0,9F1.0,F2.0,52F1.0/ 6X,64F1.0,F2.0,8F1.0/6X,74F1.0/6X,53F1.0,2F2.0,3F1.0) INPUT MEDIUM SPL:D2028.DAT ~ ~ ~ ~ BIRTHSNO,TOTAL CHILDREN INCLUDING DEAD/ KIDSDIED,NUMBER OF DECEASED CHILDREN/ KIDSLIVE,NUMBER OF LIVE CHILDREN UNDER 15 YRS/ LEAVESCH,AGE EXPECT KIDS TO LEAVE SCHOOL/ TEENLIVE,NUMBER OF LIVE CHILDREN OVER 15 YRS/ d028a.sps title RUNCIMAN RELATIVE DEPRIVATION AND SOCIAL JUSTICE file handle d028a/name='/ufs3/howas/028/d2028.dat' data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10 kidsdied 11 kidslive 12 leavesch 13 teenlive 16-17 teensch 18 feeschs 19 moreeduc 20 eductype 21 madwell 22 head 23 adults 24 There is clearly an error in 4 variables which were read in from the wrong columns. The SPSS syntax supplied in d028a.sps above is wrong: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10 kidsdied 11 kidslive 12 leavesch 13 teenlive 16-17 teensch 18 It needs changing to: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10-11 kidsdied 12 kidslive 13-14 leavesch 15 teenlive 16-17 teensch 18 Unfortunately the original single-punched raw data ASCII file no longer exists, so unless SPSS can read the binary file, the correct data can only be recovered in SPSS syntax by using an original spread out single punched Ascii file, but I think there may be an error in lines 8-14 of the conversion file r028.dat COPY 10 I’ve managed to look inside the d028.fnt file: it’s some sort of holecount (see extract of cards 10 – 19 below). 10: 341 1074 : 10: : 11: 383 522 326 99 37 21 17 7 3 : 11: : 12: 1282 96 24 8 2 3 : 12: : 13: 325 1090 : 13: : 14: 594 228 165 390 25 6 4 2 1 : 14: : 15: 128 263 31 74 919 : 15: : 16: 330 1085 : 16: : 17: 384 295 206 428 54 21 14 6 2 5 : 17: : 18: 709 350 256 95 3 2 : 18: : 19: 346 172 862 26 9 I’ve had a look at the FM but am none the wiser as to how to read the binary file into SPSS, or how to recreate the first data record. Can anyone help? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
Administrator
|
John,
I suspect you would need some sort of time machine. That first code snippet takes me back some 30 years (I first used SPSS in 1983). Wondering why this 50+ year old data is of any interest in 2014? Good Luck!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by John F Hall
Have you tried to rename the file d2028.bin to d2028.txt to see what's inside?
If it is very big it might help to have a good editor, otherwise notepad may be sufficient (unless you want to dive deep into DOS: WIN+R, cmd, type path\d2028.bin|more, Ctrl+C, Exit...). From the definitions (INPUT FORMAT...) it seems to be 4 lines to each record, and the lines seem to have 80, 80, 80 and 63 columns respectively, making each record 303 columns on 72+73+74+63= 282 variables. I would try to read in the d2028.bin using something like DATALIST LIST FILE=d2028.bin RECORDS=4 1/row1varlist (formats) 2/row2varlist (formats)...providing the d0228.bin is in an understandable format. HTH, PR |
Administrator
|
In reply to this post by John F Hall
I suspect that .bin file is column binary (multipunch).
At one point long ago I knew how to read those (with major eye gouging and great resistance) and the OLD SPSS manuals had some documentation in the appendix. These days you see reference in FILE HANDLE to MODE=MULTIPUNCH however no examples of how to read the bloody thing. Why not go with the .dat file (sounds like someone has already converted the multipunch data and all you need to do is hit the .dat with an appropriate DATA LIST command. -- As far as Multipunch files? I deliberately forgot everything I ever knew about them about 15 years ago. Here is a hint from an OLD post in the SPSSX-L archive: http://listserv.uga.edu/cgi-bin/wa?A2=ind9612&L=spssx-l&P=27638 Quoting from that: "FILE HANDLE mpdata NAME='your file and path' / MODE=MULTIPUNCH. DATA LIST FILE=mpdata RECORDS n etc. where n is the number of records per respondent, and mpdata is the filename. You would attach your data file to the filename mpdata, and specify the option "MODE=MULTIPUNCH", for a multipunch file. BTW, FILE HANDLE can be used for ASCII files with record length>1024, as in the LRECL=2000 option. If you want to read a multipunched column (as with unaided brand awareness), take a look at this example. Assume that you're interested in card 1, column 23, and awareness punches are 1-9, 0, X, and Y. Coding would work like this: /1 AWARE01 TO AWARE09 23:4-12 AWARE10 23:3 AWARE11 23:2 AWARE12 23:1 This creates separate "dummy" variables for each punch, coded as 1 for the response code, and 0 if not coded for the respondent. You can then run TABLES or MULTIPLE RESPONSE with these variables. The numeric sequence is because SPSS reads multipunch "rows" as Y,,X,0,1-9. Hope this helps you out.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by John F Hall
Bob Thanks for this. As far as I know it’s not a nested file. I know about the + and - zone punches as I had to deal with many of these in the 1970s, reading them in as alpha and then converting to numeric. SPSS doesn’t like invalid combinations and is registering an error.
I’m still working my way round the binary file and so far have managed to read part of it and produce correct frequencies for two named variables, plus frequencies for single columns. The problem I now have is that the columns are not the same as the ones in the codebook and I still have to reconcile the frequencies and values tabulated with those in the codebook. For instance the number of children born to R including those who died is supposed to be in columns 10-11 but is actually in column 7. I’m using the tedious method of reading one column at a time, but at least it works. FILE HANDLE sn28 /NAME='C:\Users\John\d2028.bin' /MODE=MULTIPUNCH. data list file sn28 /1 serial 1-4 v105 to v120 5-20 (a). list serial /cases 5. freq v105 to v120. What I’m not sure about is the number of records in the binary file, but it looks like there are eight: there may only be one.. John John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop From: Bob Walker [mailto:[hidden email]] Hi John, You’re implying that you have column-binary card image data an using 80-column card format. The FILE HANDLE command using MODE=MULTIPUNCH should be able to read your *.bin file without too much effort, I think. Keep in mind that you will need to account for 0-9, ‘x’, ‘y’, and ‘&’ by using RECODE with CONVERT. It is rather hard to figure out the layout from what you pasted, but you may have a nested record structure (i.e., one master household record, and then perhaps one per child). The hole counts should help figure that out. From: SPSSX(r) Discussion [[hidden email]] On Behalf Of John F Hall Please can anyone help with getting SPSS to read a binary file or alternatively to (re)create an ASCII file from one? I am working with SPSS files distributed by UK Data Services (University of Essex) for Gary Runciman’s 1962 survey, as reported in his book: W G Runciman Relative Deprivation and Social Justice (RKP, 1966) The original data were multipunched on 80-column cards: a single column can hold data for more than one variable. The data were later spread out on new set of single-punched cards and read into SPSS using INPUT FORMAT. Later operations involved a new version of the setup file using DATA LIST instead. During data checks I discovered that counts for birthsno (no of live children ever) and kidsdied (no of children who died) were impossible:
These figures do not make sense. Further checks revealed a discrepancy between the frequencies in the codebook and the frequencies obtained from the SPSS file for these and two other (adjacent) variables. I immediately suspected a data format reading error and reported it back to UKDS. UKDS have now privately supplied me with three files used to create the SPSS file d2028.bin a binary file with multipunched data (my computer thinks it’s movie.) r028.dat some sort of conversion from multi-punch to ASCII (Fortran?) do28.fnt holecount of multipunched data? In the package distributed by UKDS there are two *.sps files and one *.sav file: d028.sps original 1975 setup file using INPUT FORMAT d028a.sps modified setup file using DATA LIST sn28.sav SPSS saved file d028.sps: FILE NAME RUNCIMAN,RELATIVE DEPRIVATION AND SOCIAL JUSTICE VARIABLE LIST CASENO,CARDNO,NEWHOME,OLDHOME,MARITAL,BIRTHSNO,KIDSDIED,KIDSLIVE, LEAVESCH,TEENLIVE,TEENSCH,FEESCHS,MOREEDUC,EDUCTYPE ~ ~ ~ ~ LIFESTYL,ACCENT,AGE,SEX,OCCUP,EDUCFIN,INCOME,WIFECASH,SEENHOME, SEENWORK,SEENOTH # OF CASES 1415 INPUT FORMAT FIXED(F4.0,F2.0,3F1.0,F2.0,F1.0,F2.0,F1.0,F2.0,9F1.0,F2.0,52F1.0/ 6X,64F1.0,F2.0,8F1.0/6X,74F1.0/6X,53F1.0,2F2.0,3F1.0) INPUT MEDIUM SPL:D2028.DAT ~ ~ ~ ~ BIRTHSNO,TOTAL CHILDREN INCLUDING DEAD/ KIDSDIED,NUMBER OF DECEASED CHILDREN/ KIDSLIVE,NUMBER OF LIVE CHILDREN UNDER 15 YRS/ LEAVESCH,AGE EXPECT KIDS TO LEAVE SCHOOL/ TEENLIVE,NUMBER OF LIVE CHILDREN OVER 15 YRS/ d028a.sps title RUNCIMAN RELATIVE DEPRIVATION AND SOCIAL JUSTICE file handle d028a/name='/ufs3/howas/028/d2028.dat' data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10 kidsdied 11 kidslive 12 leavesch 13 teenlive 16-17 teensch 18 feeschs 19 moreeduc 20 eductype 21 madwell 22 head 23 adults 24 There is clearly an error in 4 variables which were read in from the wrong columns. The SPSS syntax supplied in d028a.sps above is wrong: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10 kidsdied 11 kidslive 12 leavesch 13 teenlive 16-17 teensch 18 It needs changing to: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9 birthsno 10-11 kidsdied 12 kidslive 13-14 leavesch 15 teenlive 16-17 teensch 18 Unfortunately the original single-punched raw data ASCII file no longer exists, so unless SPSS can read the binary file, the correct data can only be recovered in SPSS syntax by using an original spread out single punched Ascii file, but I think there may be an error in lines 8-14 of the conversion file r028.dat COPY 10 I’ve managed to look inside the d028.fnt file: it’s some sort of holecount (see extract of cards 10 – 19 below). 10: 341 1074 : 10: : 11: 383 522 326 99 37 21 17 7 3 : 11: : 12: 1282 96 24 8 2 3 : 12: : 13: 325 1090 : 13: : 14: 594 228 165 390 25 6 4 2 1 : 14: : 15: 128 263 31 74 919 : 15: : 16: 330 1085 : 16: : 17: 384 295 206 428 54 21 14 6 2 5 : 17: : 18: 709 350 256 95 3 2 : 18: : 19: 346 172 862 26 9 I’ve had a look at the FM but am none the wiser as to how to read the binary file into SPSS, or how to recreate the first data record. Can anyone help? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
Hi John, Without a valid codebook, it will be a bit of a guessing game, so I don’t envy you. The first few columns were typically set aside for a respondent ID and 79-80 for a card number, but your
data file may not adhere to this. I’d only suggest that you experiment with MODE switches, for example /MODE=MULTIPUNCH vs. /MODE=EBCDIC (the old IBM standard) may yield different output and perhaps make that alignment process a little less tedious for you.
Bob Walker Surveys & Forecasts, LLC From: John F Hall [mailto:[hidden email]] Bob Thanks for this. As far as I know it’s not a nested file. I know about the + and - zone punches as I had to deal with many of these in the 1970s, reading them in as alpha and then converting to
numeric. SPSS doesn’t like invalid combinations and is registering an error.
I’m still working my way round the binary file and so far have managed to read part of it and produce correct frequencies for two named variables, plus frequencies for single columns. The problem
I now have is that the columns are not the same as the ones in the codebook and I still have to reconcile the frequencies and values tabulated with those in the codebook. For instance the number of children born to R including those who died is supposed to be in columns 10-11 but is actually in column 7. I’m using the tedious method of reading one column at a time, but at least it works. FILE HANDLE sn28 /NAME='C:\Users\John\d2028.bin' /MODE=MULTIPUNCH. data list file sn28 /1 serial 1-4 v105 to v120 5-20 (a). list serial /cases 5. freq v105 to v120. What I’m not sure about is the number of records in the binary file, but it looks like there are eight: there may only be one.. John John F Hall (Mr) [Retired academic survey researcher] Email:
[hidden email] Website:
www.surveyresearch.weebly.com
SPSS start page:
www.surveyresearch.weebly.com/1-survey-analysis-workshop From: Bob Walker [[hidden email]]
Hi John, You’re implying that you have column-binary card image data an using 80-column card format. The FILE HANDLE command using MODE=MULTIPUNCH should be able to read your *.bin file without too
much effort, I think. Keep in mind that you will need to account for 0-9, ‘x’, ‘y’, and ‘&’ by using RECODE with CONVERT. It is rather hard to figure out the layout from what you pasted, but you may have a nested record structure (i.e., one master household record, and then perhaps one per child). The hole counts
should help figure that out. From: SPSSX(r) Discussion [[hidden email]]
On Behalf Of John F Hall Please can anyone help with getting SPSS to read a binary file or alternatively to (re)create an ASCII file from one? I am working with SPSS files distributed by UK Data Services (University of Essex) for Gary Runciman’s 1962 survey, as reported in his book: W G Runciman Relative Deprivation and Social Justice (RKP, 1966) The original data were multipunched on 80-column cards: a single column can hold data for more than one variable. The data were later spread out on new set of
single-punched cards and read into SPSS using INPUT FORMAT. Later operations involved a new version of the setup file using DATA LIST instead.
During data checks I discovered that counts for
birthsno (no of live children ever) and
kidsdied (no of children who died) were impossible:
These figures do not make sense. Further checks revealed a discrepancy between the frequencies in the codebook and the frequencies obtained from the SPSS file for these and two other (adjacent)
variables. I immediately suspected a data format reading error and reported it back to UKDS. UKDS have now privately supplied me with three files used to create the SPSS file d2028.bin a binary file with multipunched data (my computer
thinks it’s movie.) r028.dat some sort of conversion from multi-punch
to ASCII (Fortran?) do28.fnt holecount of multipunched data? In the package distributed by UKDS there are two *.sps files and one *.sav file: d028.sps original 1975 setup file using INPUT FORMAT
d028a.sps modified setup file using DATA LIST sn28.sav SPSS saved file d028.sps: FILE NAME RUNCIMAN,RELATIVE DEPRIVATION AND SOCIAL JUSTICE VARIABLE LIST CASENO,CARDNO,NEWHOME,OLDHOME,MARITAL,BIRTHSNO,KIDSDIED,KIDSLIVE, LEAVESCH,TEENLIVE,TEENSCH,FEESCHS,MOREEDUC,EDUCTYPE ~ ~ ~ ~ LIFESTYL,ACCENT,AGE,SEX,OCCUP,EDUCFIN,INCOME,WIFECASH,SEENHOME, SEENWORK,SEENOTH # OF CASES 1415 INPUT FORMAT FIXED(F4.0,F2.0,3F1.0,F2.0,F1.0,F2.0,F1.0,F2.0,9F1.0,F2.0,52F1.0/ 6X,64F1.0,F2.0,8F1.0/6X,74F1.0/6X,53F1.0,2F2.0,3F1.0) INPUT MEDIUM SPL:D2028.DAT ~ ~ ~ ~ BIRTHSNO,TOTAL CHILDREN INCLUDING DEAD/ KIDSDIED,NUMBER OF DECEASED CHILDREN/ KIDSLIVE,NUMBER OF LIVE CHILDREN UNDER 15 YRS/ LEAVESCH,AGE EXPECT KIDS TO LEAVE SCHOOL/ TEENLIVE,NUMBER OF LIVE CHILDREN OVER 15 YRS/ d028a.sps title RUNCIMAN RELATIVE DEPRIVATION AND SOCIAL JUSTICE file handle d028a/name='/ufs3/howas/028/d2028.dat' data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9
birthsno 10 kidsdied 11 kidslive 12 leavesch 13
teenlive 16-17 teensch 18 feeschs 19 moreeduc 20 eductype 21 madwell 22 head 23 adults 24 There is clearly an error in 4 variables which were read in from the wrong columns.
The SPSS syntax supplied in
d028a.sps above is wrong: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9
birthsno 10 kidsdied 11 kidslive 12 leavesch 13
teenlive 16-17 teensch 18 It needs changing to: data list fixed file=d028a records=4 /1 caseno 1-4 cardno01 5-6 newhome 7 oldhome 8 marital 9
birthsno 10-11 kidsdied 12 kidslive 13-14 leavesch 15
teenlive 16-17 teensch 18 Unfortunately the original single-punched raw data ASCII file no longer exists, so unless SPSS can read the binary file, the correct data can only be recovered
in SPSS syntax by using an original spread out single punched Ascii file, but I think there may be an error in lines 8-14 of the conversion file
r028.dat COPY 10 I’ve managed to look inside the d028.fnt file: it’s some sort of holecount (see extract of cards 10 – 19 below). 10: 341 1074 : 10: : 11: 383 522 326 99 37 21 17 7 3 : 11: : 12: 1282 96 24 8 2 3 : 12: : 13: 325 1090 : 13: : 14: 594 228 165 390 25 6 4 2 1 : 14: : 15: 128 263 31 74 919 : 15: : 16: 330 1085 : 16: : 17: 384 295 206 428 54 21 14 6 2 5 : 17: : 18: 709 350 256 95 3 2 : 18: : 19: 346 172 862 26 9
I’ve had a look at the FM but am none the wiser as to how to read the binary file into SPSS, or how to recreate the first data record. Can anyone help?
John F Hall (Mr) [Retired academic survey researcher] Email:
[hidden email] Website:
www.surveyresearch.weebly.com
SPSS start page:
www.surveyresearch.weebly.com/1-survey-analysis-workshop
|
Free forum by Nabble | Edit this page |