Trouble reading .csv containing Euro signs

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Trouble reading .csv containing Euro signs

Ruben Geert van den Berg
Dear all,

I'm having trouble reading a .csv file containing Euro signs. I thought UNICODE should prevent such trouble or otherwise LOCALE but they don't. I tried

set locale='english' unicode on.

GET DATA
 /TYPE=TXT
 /FILE="C:\temp\GA_export.csv"
 /DELCASE=LINE
 /DELIMITERS=","
 /QUALIFIER='"'
 /ARRANGEMENT=DELIMITED
 /FIRSTCASE=2
 /IMPORTCASE=ALL
 /VARIABLES=
 Day A28
 Revenue A12.
CACHE.
EXECUTE.

Does anybody understand what's going wrong/why and how to fix it?

A sample of the input file may be downloaded here.

TIA!
Reply | Threaded
Open this post in threaded view
|

Re: Trouble reading .csv containing Euro signs

Jon K Peck
The file you supplied is actually encoded in code page 1252, not Unicode.  The Euro code is x80 in that code page.  In Unicode it would be x20AC.  So you need to read the file in code page mode.  In Statistics 21,  you could add /encode='locale' to the GET DATA command.  That subcommand is  not available in V20, but an encoding specification is available on DATA LIST in that version.

Alternatively, if you are running Statistics in code page mode, the file encoding would be assumed to be code page unless a BOM (Byte Order Mark) is present.

HTH,


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Ruben Geert van den Berg <[hidden email]>
To:        [hidden email],
Date:        06/04/2013 02:40 AM
Subject:        [SPSSX-L] Trouble reading .csv containing Euro signs
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear all,

I'm having trouble reading a .csv file containing Euro signs. I thought
UNICODE should prevent such trouble or otherwise LOCALE but they don't. I
tried

set locale='english' unicode on.

GET DATA
/TYPE=TXT
/FILE="C:\temp\GA_export.csv"
/DELCASE=LINE
/DELIMITERS=","
/QUALIFIER='"'
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
Day A28
Revenue A12.
CACHE.
EXECUTE.

Does anybody understand what's going wrong/why and how to fix it?

A sample of the input file may be downloaded  here
<
https://dl.dropboxusercontent.com/u/116120595/GA_export.csv>  .

TIA!



--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Trouble-reading-csv-containing-Euro-signs-tp5720545.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Trouble reading .csv containing Euro signs

Ruben Geert van den Berg
Thanks Jon!

Just switching Unicode off did the trick.

I guess the problem is that the Euro sign is a non ASCII character and in that case the text encoding really matters.

Lesson learned.