Hi,
I have some .sav data that, for some reason, contains an ASCII syn character (ordinal 22). It should have been an ampersand. When I export the file to .xlsx, MS Excel complains about an invalid XML sign. I use SPSS v22FP2 32-bit. Is SPSS or MS Office the cause of the problem? I don't have other spss versions or e.g LibreOffice to check, so I was hoping if somebody else could help me.
data list free / s (ahex100). begin data 4245 42454420162020425245414B46415354202020202020202020202020202020202020202020202020 end data.
save translate outfile = '%temp%\test.xlsx' /type=xls /version=12 /replace /fieldnames /cells=values.
* Fixing it is easy, but ugly, esp since this is hardly ever needed. string s_correct (a100) #char (a1). compute s_correct = s. loop #i = 1 to char.length(s). +compute #char = char.substr(s, #i, 1). +compute #ordinal = number(#char, pib1). +do if char.length(#char). + if ( #ordinal le 31 ) s_correct = replace(s_correct, #char, ""). +end if. end loop. execute.
Thanks!
Albert-Jan
|
Hi, Albert-Jan, same problem with v23. I tried ... test.csv' /type=csv /replace /fieldnames /cells=values /ENCODING=LOCALE. which can be opened with excel. Greetings, Mario Albert-Jan Roskam <[hidden email]> schrieb am 11:16 Mittwoch, 24.August 2016: Hi,
I have some .sav data that, for some reason, contains an ASCII syn character (ordinal 22). It should have been an ampersand. When I export the file to .xlsx, MS Excel complains about an invalid XML sign. I use SPSS v22FP2 32-bit. Is SPSS or MS Office the
cause of the problem? I don't have other spss versions or e.g LibreOffice to check, so I was hoping if somebody else could help me.
data list free / s (ahex100).
begin data
4245
42454420162020425245414B46415354202020202020202020202020202020202020202020202020
end data.
save translate outfile = '%temp%\test.xlsx' /type=xls /version=12 /replace /fieldnames /cells=values.
* Fixing it is easy, but ugly, esp since this is hardly ever needed.
string s_correct (a100) #char (a1).
compute s_correct = s.
loop #i = 1 to char.length(s).
+compute #char = char.substr(s, #i, 1).
+compute #ordinal = number(#char, pib1).
+do if char.length(#char).
+ if ( #ordinal le 31 ) s_correct = replace(s_correct, #char, "").
+end if.
end loop.
execute.
Thanks!
Albert-Jan
|
In reply to this post by Albert-Jan Roskam-3
If you save the file as xls instead of xlsx, Excel opens it without a problem. Of course, the bad character doesn't display as &. My guess is that Excel is not expecting characters in that range, but I don't know whether there is an encoding problem on the Statistics side or not. When you read data in hex format, Statistics does no character conversions, so you need to be sure that the hex codes correspond to the character encodings for the mode Statistics is in. If you do want to ensure that such a character gets replaced with something legitimate, there is an easier way to do it. compute s = replace(s, string(22, pib1),'&'). On Wed, Aug 24, 2016 at 3:15 AM, Albert-Jan Roskam <[hidden email]> wrote:
|
Hi Mario, Jon,
Thanks for your replies. Either .xls or .csv will work. Jon, your solution is a lot cleaner, even if I wrap it in a Do Repeat to replace everything below 32. Albert-Jan From: Jon Peck <[hidden email]>
Sent: Wednesday, August 24, 2016 1:47:24 PM To: Albert-Jan Roskam Cc: SPSS List Subject: Re: [SPSSX-L] ASCII control character causes invalid .xlsx If you save the file as xls instead of xlsx, Excel opens it without a problem. Of course, the bad character doesn't display as &. My guess is that Excel is not expecting characters in that range, but I don't
know whether there is an encoding problem on the Statistics side or not. When you read data in hex format, Statistics does no character conversions, so you need to be sure that the hex codes correspond to the character encodings for the mode Statistics is
in.
If you do want to ensure that such a character gets replaced with something legitimate, there is an easier way to do it.
compute s = replace(s, string(22, pib1),'&').
On Wed, Aug 24, 2016 at 3:15 AM, Albert-Jan Roskam
<[hidden email]> wrote:
|
Administrator
|
I created a weird XLS file last night by copying the tables from here:
https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html Select all / copy/paste. Reading the XLS into SPSS caused everything to be read as string. Trying ALTER TYPE and NUMBER function utterly FAILED (all values became SYSMIS)! Upon inspection EVERY nonblank cell had a leading LF (PIB value 10, AHEX 0A) character. A bit of hair pulling and experimentation resulted in the following solution. (there were also commas in some of the tables so the last line in the DO REPEAT lock diddles that). /* For some reason we were getting a HEX 0A =PIB(10)=LF at the head of each numeric value */. ALTER TYPE ALL (AHEX). DO REPEAT v=ALL. /* Strip off the first byte of each value */. COMPUTE v=CHAR.SUBSTR(v,2). /* Comma's get stripped here */. COMPUTE v=REPLACE(v,",",""). END REPEAT. ALTER TYPE x_Male x_Female (F3) qx_Male qx_Female(F10.5) lx_Male LxA_Male Tx_Male lx_Female LxA_Female Tx_Female(Comma10) dx_Male dx_Female (Comma10) OeX_Male OeX_Female(F8.2) V8 (A8).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
How about:
Do repeat v = All. Compute v = ltrim(v, string(10, pib1)). End repeat. Bottom line is: Excel sucks! :-) From: SPSSX(r) Discussion <[hidden email]> on behalf of David Marso <[hidden email]>
Sent: Wednesday, August 31, 2016 4:54:34 AM To: [hidden email] Subject: Re: [SPSSX-L] ASCII control character causes invalid .xlsx I created a weird XLS file last night by copying the tables from here:
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
https://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html Select all / copy/paste. Reading the XLS into SPSS caused everything to be read as string. Trying ALTER TYPE and NUMBER function utterly FAILED (all values became SYSMIS)! Upon inspection EVERY nonblank cell had a leading LF (PIB value 10, AHEX 0A) character. A bit of hair pulling and experimentation resulted in the following solution. (there were also commas in some of the tables so the last line in the DO REPEAT lock diddles that). /* For some reason we were getting a HEX 0A =PIB(10)=LF at the head of each numeric value */. ALTER TYPE ALL (AHEX). DO REPEAT v=ALL. /* Strip off the first byte of each value */. COMPUTE v=CHAR.SUBSTR(v,2). /* Comma's get stripped here */. COMPUTE v=REPLACE(v,",",""). END REPEAT. ALTER TYPE x_Male x_Female (F3) qx_Male qx_Female(F10.5) lx_Male LxA_Male Tx_Male lx_Female LxA_Female Tx_Female(Comma10) dx_Male dx_Female (Comma10) OeX_Male OeX_Female(F8.2) V8 (A8). ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/ASCII-control-character-causes-invalid-xlsx-tp5732998p5733024.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
That would likely work. The data were pasted from a table on the site I posted and then pasted into Open Office. I have NO idea where the 0A padding came from. I may try to directly paste it into SPSS and see what happens.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
In recent threads on another discussion list, participants noted that Excel
autoformatting introduces errors in genetics papers because gene names are reset to dates. http://www.sciencemag.org/news/sifter/one-five-genetics-papers-contains-erro rs-thanks-microsoft-excel?utm_source=sciencemagazine&utm_medium=facebook-tex t&utm_campaign=excel-6950 -- Tony Babinec -- ASA Council of Chapters Chair, -- Joint Statistical Meetings 2017 Program Committee -- [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Great link, thanks. Yes, I've noticed something similar before with disease codes that were turned into dates, or other codes where leading zeros got lost. Quite annoying when the first digit means "chapter", the second "paragraph", etc. Code 0001 means
chapter 0, code 1 means chapter 1, not otherwise specified. Brrrr. From: SPSSX(r) Discussion <[hidden email]> on behalf of Anthony Babinec <[hidden email]>
Sent: Wednesday, August 31, 2016 11:34 AM To: [hidden email] Subject: Re: [SPSSX-L] ASCII control character causes invalid .xlsx In recent threads on another discussion list, participants noted that Excel
autoformatting introduces errors in genetics papers because gene names are reset to dates. http://www.sciencemag.org/news/sifter/one-five-genetics-papers-contains-erro
rs-thanks-microsoft-excel?utm_source=sciencemagazine&utm_medium=facebook-tex t&utm_campaign=excel-6950 -- Tony Babinec -- ASA Council of Chapters Chair, -- Joint Statistical Meetings 2017 Program Committee -- [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I HATE trying to read XML based table crap! YIKES. Go to one of those tables and View Source. At least Excel brought it in as a tabular structure. Pasting a copy into the SPSSData Editor pushes it into a single column. Same with Syntax. That would perhaps explain all those LF characters. On Wed, Aug 31, 2016 at 7:57 AM, Albert-Jan Roskam-3 [via SPSSX Discussion] <[hidden email]> wrote:
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Albert-Jan Roskam-3
The key word here is ‘autoformatting’ (in Anthony Babinec’s reply). If Excel is allowed to make its own best guess as to the data type, it will get it wrong when it finds things that look a bit like dates, but aren’t. I am guessing that the people who’ve been unpleasantly surprised when data have been turned, inappropriately, into dates have simply allowed Excel to import a CSV file using the default data type of ‘general’, rather than setting it to ‘text’. When a field’s data type is set to ‘text’ in the import wizard, it leaves the data as it finds it and does not try to change things into dates if they look a bit like one. On “The Register” in reader discussions of the story about the DNA researchers who discovered that gene sequence codes had been turned into dates, some of the commenters got quite agitated about what they saw as Excel ‘thinking it knows better’, or ‘over-riding the user’s wishes’ about data types. However, I’ve never known Excel to alter a data type from ‘text’ if it has been defined by the user as that on import. The CSV format is a potential trap for the unwary because it does not embed the data type with each data field. So it puts an onus on users to properly define the data when they read it into Excel, and it looks as if some DNA researchers have not been aware of this. From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-Jan Roskam Great link, thanks. Yes, I've noticed something similar before with disease codes that were turned into dates, or other codes where leading zeros got lost. Quite annoying when the first digit means "chapter", the second "paragraph", etc. Code 0001 means chapter 0, code 1 means chapter 1, not otherwise specified. Brrrr. From: SPSSX(r) Discussion <[hidden email]> on behalf of Anthony Babinec <[hidden email]> In recent threads on another discussion list, participants noted that Excel
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |