Hello, SPSS friends, I want to read some hundred text files of about 18 MB size each into SPSS. Via GUI it's not possible because a message says that UTF-16 coded files cannot be accessed via the text assistant. When I try DATA LIST some weird problems arise. Here's an example of two lines of the files: =================================================================================== 11699947 3780688 75.9 EUR 7c9381f824c84c25374be53c43bb4494 2013-11-14 11:55:36 2013-12-13
13:48:16 1 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 11699948 3780688 75.9 EUR 1bd4956a7f3bfee6e8d104aa418bc2c1 2013-11-24 11:26:24 2013-12-13 13:48:16 2 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 =================================================================================== Here's my syntax: DATA LIST FILE='Paket_001.dat' ENCODING='UTF16' LIST (TAB) / v1 (F8) v2 (F8) v3 (F10.2) v4 (A5) v5 (A40) v6 (A20) v7 (A20) v8 (F8) v9 (F8) v10 (A10) v11 (A20) v12 (F8) v13 (A10) v14 (A20) v15 (A20) v16 (A20) v17 (A20) v18 (A20) v19 (F8) v20 (F8) v21 (F8) v22 (F8) v23 (F8). EXECUTE. =================================================================================== This is the warnings I get: >Warnung Nr. 1187 >Nach der Konvertierung aus UTF-16 war ein Eingabedatensatz zu lang. >Command line: 171 Current case: 1 Current splitfile group: 1 >Warnung Nr. 1102 >Ungültiges numerisches Feld gefunden. The result has been set to the >system-missing value. >Command line: 171 Current case: 1 Current splitfile group: 1 >Field contents: 'ऱर啎䱌उ愴㍥㔷㘰挴㑢攴㔸ㄴ昱〲慣㕡戴晡!' >Record number: 1 Starting column: 1 Record length: 12288 =================================================================================== Does somebody have an idea how I can get these files into SPSS? Any Help is appreciated. Mario
Mario Giesel
Munich, Germany |
Remember that formats are in bytes, and
utf-16 would be two bytes per character. It's hard to tell with this
extract, but the widths don't seem to match up with the data. In
fact, perhaps due to the vagaries of email, the text doesn't look like
utf16 at all.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Mario Giesel <[hidden email]> To: [hidden email], Date: 02/16/2014 09:45 AM Subject: [SPSSX-L] Read UTF-16 text file problem Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello, SPSS friends, I want to read some hundred text files of about 18 MB size each into SPSS. Via GUI it's not possible because a message says that UTF-16 coded files cannot be accessed via the text assistant. When I try DATA LIST some weird problems arise. Here's an example of two lines of the files: =================================================================================== 11699947 3780688 75.9 EUR 7c9381f824c84c25374be53c43bb4494 2013-11-14 11:55:36 2013-12-13 13:48:16 1 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 11699948 3780688 75.9 EUR 1bd4956a7f3bfee6e8d104aa418bc2c1 2013-11-24 11:26:24 2013-12-13 13:48:16 2 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 =================================================================================== Here's my syntax: DATA LIST FILE='Paket_001.dat' ENCODING='UTF16' LIST (TAB) / v1 (F8) v2 (F8) v3 (F10.2) v4 (A5) v5 (A40) v6 (A20) v7 (A20) v8 (F8) v9 (F8) v10 (A10) v11 (A20) v12 (F8) v13 (A10) v14 (A20) v15 (A20) v16 (A20) v17 (A20) v18 (A20) v19 (F8) v20 (F8) v21 (F8) v22 (F8) v23 (F8). EXECUTE. =================================================================================== This is the warnings I get: >Warnung Nr. 1187 >Nach der Konvertierung aus UTF-16 war ein Eingabedatensatz zu lang. >Command line: 171 Current case: 1 Current splitfile group: 1 >Warnung Nr. 1102 >Ungültiges numerisches Feld gefunden. The result has been set to the >system-missing value. >Command line: 171 Current case: 1 Current splitfile group: 1 >Field contents: 'ऱर啎䱌उ愴㍥㔷㘰挴㑢攴㔸ㄴ昱〲慣㕡戴晡!' >Record number: 1 Starting column: 1 Record length: 12288 =================================================================================== Does somebody have an idea how I can get these files into SPSS? Any Help is appreciated. Mario |
Thanks, Jon. I think you guessed right that data are not in UTF-16. The error occurred when I opened "all files" instead of "text files" via GUI. I got it working with this syntax now: GET DATA /TYPE=TXT /FILE= "Paket_001.dat" /ENCODING='Locale' /DELCASE=LINE /DELIMITERS="\t " /ARRANGEMENT=DELIMITED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= V1 F3.0 V2 F1.0 V3 A4 V4 F1.0 V5 A32 V6 A10 V7 A8 V8 A10 V9 A8 V10 F1.0 V11 F1.0 V12 A5 V13 A1 V14 F1.0 V15 A6 V16 A10 V17 A6 V18 A6 V19 A7 V20 A7 V21 F1.0 V22 F1.0 V23 F1.0 V24 F2.0 V25 F2.0 V26 F2.0 V27 F1.0 V28 F1.0 V29 F1.0 V30 F1.0 V31 F1.0 V32 F1.0. CACHE. EXECUTE. Thanks a lot, Mario Jon K Peck <[hidden email]> schrieb am 22:32 Sonntag, 16.Februar 2014: Remember that formats are in bytes, and
utf-16 would be two bytes per character. It's hard to tell with this
extract, but the widths don't seem to match up with the data. In
fact, perhaps due to the vagaries of email, the text doesn't look like
utf16 at all.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Mario Giesel <[hidden email]> To: [hidden email], Date: 02/16/2014 09:45 AM Subject: [SPSSX-L] Read UTF-16 text file problem Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello, SPSS friends, I want to read some hundred text files of about 18 MB size each into SPSS. Via GUI it's not possible because a message says that UTF-16 coded files cannot be accessed via the text assistant. When I try DATA LIST some weird problems arise. Here's an example of two lines of the files: =================================================================================== 11699947 3780688 75.9 EUR 7c9381f824c84c25374be53c43bb4494 2013-11-14 11:55:36 2013-12-13 13:48:16 1 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 11699948 3780688 75.9 EUR 1bd4956a7f3bfee6e8d104aa418bc2c1 2013-11-24 11:26:24 2013-12-13 13:48:16 2 0 CLICK - 0 Direct anyChannel Direct Direct default default 6 7 10 11 12 =================================================================================== Here's my syntax: DATA LIST FILE='Paket_001.dat' ENCODING='UTF16' LIST (TAB) / v1 (F8) v2 (F8) v3 (F10.2) v4 (A5) v5 (A40) v6 (A20) v7 (A20) v8 (F8) v9 (F8) v10 (A10) v11 (A20) v12 (F8) v13 (A10) v14 (A20) v15 (A20) v16 (A20) v17 (A20) v18 (A20) v19 (F8) v20 (F8) v21 (F8) v22 (F8) v23 (F8). EXECUTE. =================================================================================== This is the warnings I get: >Warnung Nr. 1187 >Nach der Konvertierung aus UTF-16 war ein Eingabedatensatz zu lang. >Command line: 171 Current case: 1 Current splitfile group: 1 >Warnung Nr. 1102 >Ungültiges numerisches Feld gefunden. The result has been set to the >system-missing value. >Command line: 171 Current case: 1 Current splitfile group: 1 >Field contents: 'ऱर啎䱌उ愴㍥㔷㘰挴㑢攴㔸ㄴ昱〲慣㕡戴晡!' >Record number: 1 Starting column: 1 Record length: 12288 =================================================================================== Does somebody have an idea how I can get these files into SPSS? Any Help is appreciated. Mario
Mario Giesel
Munich, Germany |
Free forum by Nabble | Edit this page |