My survey data vendor provides exports as .sav and also .sps for a data file
that is in CSV format (sort-of). I normally use the .sav file but there is something going on with the labeling, so I'm trying to see what I can do with the data file and the syntax. The support folks at the vendor who could help with this have left for the day, so I'm hoping that the list can give me a little guidance. The data file is structured like this: "1", "11/Apr/2013", "Partial", "", "", "English", The part of the syntax file that I'm confused over is this: GET DATA /TYPE=TXT /FILE="spss.txt" /DELCASE=LINE /DELIMITERS="," /QUALIFIER='"' /ARRANGEMENT=DELIMITED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= When I run the syntax, I get incorrect values in the appropriate variables. The first variable is OK, the second (type date) is blank, the third (which should have the response disposition without the quotes) has quotes around it. Then I tried running the Text Import Wizard. The wizard recognizes the file type, and comes up with this syntax. GET DATA /TYPE=TXT /FILE='spss.txt' /DELCASE=LINE /DELIMITERS=" ;," /QUALIFIER='"' /ARRANGEMENT=DELIMITED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= The only difference I see is in the DELIMITERS line, where my original file just specified comma, and the Text Wizard created comma, semicolon, space. When I run the syntax from the Wizard, the values in the variables look OK, but there is an extra variable created before the data variable, and another before the disposition variable. I tried changing the DELIMITERS line in the original to /DELIMITERS=";," (there are semicolons at the end of each data record), but that made no difference. Then I edited the data file to take out the extra spaces ", " => "," . The import now works correctly, using the original /DELIMITERS="," , and the labeling problems were even fixed. But it takes a long time to do the edit, and as we are still in the middle of fielding I'm wondering if there is a simpler solution by changing the syntax file. It seems that SPSS GET DATA doesn't allow for multiple consecutive delimiters to be treated as one, and now that I've reread the Text Import Wizard information I can see that is documented. I guess I can live with what I've figured out - until the vendor fixes their export - but if anyone has any additional insights I'd appreciate it. Thanks Mike ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hi Mike,
Suggestion: Go here http://spssx-discussion.1045642.n5.nabble.com/Problems-with-syntax-to-import-from-CSV-file-delimiters-and-qualifiers-tc5719777.html (your thread) on Nabble: Click the More tab and attach a small subset of the text file . That is easier than trying to determine what other anomalies might exist on the file. Don't know that I will touch it tonight because I put in a 12 hr day of coding and it is Beer-O'clock + ~ 3 hrs in PA. Ready to decompress and watch some movies. David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I've found it sometimes helps to import csv files into Excel, save as an
Excel file, and import as an Excel file into SPSS. Excel seems to parse csv variable name column headings better than SPSS. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: April-26-13 6:59 PM To: [hidden email] Subject: Re: Problems with syntax to import from CSV file - delimiters and qualifiers Hi Mike, Suggestion: Go here http://spssx-discussion.1045642.n5.nabble.com/Problems-with-syntax-to-import -from-CSV-file-delimiters-and-qualifiers-tc5719777.html (your thread) on Nabble: Click the More tab and attach a small subset of the text file . That is easier than trying to determine what other anomalies might exist on the file. Don't know that I will touch it tonight because I put in a 12 hr day of coding and it is Beer-O'clock + ~ 3 hrs in PA. Ready to decompress and watch some movies. David Mike Pritchard wrote > My survey data vendor provides exports as .sav and also .sps for a > data file that is in CSV format (sort-of). I normally use the .sav > file but there is something going on with the labeling, so I'm trying > to see what I can do with the data file and the syntax. The support > folks at the vendor who could help with this have left for the day, so > I'm hoping that the list can give me a little guidance. > > The data file is structured like this: > "1", "11/Apr/2013", "Partial", "", "", "English", > > The part of the syntax file that I'm confused over is this: > GET DATA > /TYPE=TXT > /FILE="spss.txt" > /DELCASE=LINE > /DELIMITERS="," > /QUALIFIER='"' > /ARRANGEMENT=DELIMITED > /FIRSTCASE=1 > /IMPORTCASE=ALL > /VARIABLES= > > When I run the syntax, I get incorrect values in the appropriate > variables. > The first variable is OK, the second (type date) is blank, the third > (which should have the response disposition without the quotes) has > quotes around it. > > Then I tried running the Text Import Wizard. The wizard recognizes > the file type, and comes up with this syntax. > GET DATA > /TYPE=TXT > /FILE='spss.txt' > /DELCASE=LINE > /DELIMITERS=" ;," > /QUALIFIER='"' > /ARRANGEMENT=DELIMITED > /FIRSTCASE=1 > /IMPORTCASE=ALL > /VARIABLES= > > The only difference I see is in the DELIMITERS line, where my original > file just specified comma, and the Text Wizard created comma, > semicolon, space. > > When I run the syntax from the Wizard, the values in the variables > look OK, but there is an extra variable created before the data > variable, and another before the disposition variable. > > I tried changing the DELIMITERS line in the original to > /DELIMITERS=";," > (there are semicolons at the end of each data record), but that made > no difference. > > Then I edited the data file to take out the extra spaces ", " => "," . > The import now works correctly, using the original /DELIMITERS="," , and > the labeling problems were even fixed. But it takes a long time to do > the edit, and as we are still in the middle of fielding I'm wondering > if there is a simpler solution by changing the syntax file. It seems > that SPSS GET DATA doesn't allow for multiple consecutive delimiters > to be treated as one, and now that I've reread the Text Import Wizard > information I can see that is documented. > > I guess I can live with what I've figured out - until the vendor fixes > their export - but if anyone has any additional insights I'd > appreciate it. > > Thanks > Mike > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the command. To leave the > list, send the command SIGNOFF SPSSX-L For a list of commands to > manage subscriptions, send the command INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problems-with-syntax-to-import -from-CSV-file-delimiters-and-qualifiers-tp5719777p5719779.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks David (Wasserman). I was wondering about Excel as an intermediary. I gave it a quick try, but I'd have to play with it more to get it working properly - the variables didn't get imported correctly into SPSS. The main advantage over what I already did would be speed (if I need to do this a lot) as the editor was pretty slow. But I'm hoping I don't have to do it more than once or twice, and also that the vendor fixes the problem. Thanks David (Marso). I'm attaching a file with 3 records that show the issue with multiple consecutive delimiters/qualifiers. It would be nice to know the SPSS syntax that could deal with the situation, but not high priority. Beer O-Clock is rolling around here in Seattle too... Mike |
spssExtract.csv
Ruben, I don't know why the file upload didn't work before, and I don't know if this will either :) Mike |
Administrator
|
Well here's one problem. In the third line of that uploaded file, it appears you want the following to be read as the value of a single variable:
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.3" But notice that there is a comma after KHTML, so in a CSV file, it will be split into two pieces: 1. "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, 2. like Gecko) Chrome/26.0.1410.64 Safari/537.3" I don't see an easy, non-iterative solution to this. You could replace the commas with some other field separator (e.g., the DOS pipe character), but the problem is that you don't want to replace commas like the one I pointed out above. Rather than have someone check each possible replacement in the Search & Replace, I think I would start by replacing ALL, and importing the data. Then inspect the imported file for places where information is in the wrong field (due to having replaced a comma with | where you should not have done so), revert to comma, and try again. With any luck, you'll arrive at the desired result without too many iterations. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Mike Pritchard
I haven't been following this thread closely but to follow up on Bruce's reply and to address the specific problem he identified, it seems that records could be read in as a single string variable. That variable could then be searched and instances of "," replaced with something else, like "$". (Of course, a preliminary check on whether $ was appeared anywhere would be a good idea.) Then write the file back out and read it back in using either get data or data list.
Again, from the single string per record read operation, variables need to extracted from the string and this operation, at least superficially, seems to be the same as the question last week to which DaveM posted the 'macro to end all macros' macro. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: Monday, April 29, 2013 2:14 PM To: [hidden email] Subject: Re: Problems with syntax to import from CSV file - delimiters and qualifiers Well here's one problem. In the third line of that uploaded file, it appears you want the following to be read as the value of a single variable: "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (*KHTML, like Gecko*) Chrome/26.0.1410.64 Safari/537.3" But notice that there is a comma after KHTML, so in a CSV file, it will be split into two pieces: 1. "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.31 (KHTML, 2. like Gecko) Chrome/26.0.1410.64 Safari/537.3" I don't see an easy, non-iterative solution to this. You could replace the commas with some other field separator (e.g., the DOS pipe character), but the problem is that you don't want to replace commas like the one I pointed out above. Rather than have someone check each possible replacement in the Search & Replace, I think I would start by replacing ALL, and importing the data. Then inspect the imported file for places where information is in the wrong field (due to having replaced a comma with | where you should not have done so), revert to comma, and try again. With any luck, you'll arrive at the desired result without too many iterations. HTH. Mike Pritchard wrote > spssExtract.csv <http://spssx-discussion.1045642.n5.nabble.com/file/n5719817/spssExtract.csv> > Ruben, I don't know why the file upload didn't work before, and I > don't know if this will either :) > > Mike ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problems-with-syntax-to-import-from-CSV-file-delimiters-and-qualifiers-tp5719777p5719821.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
DATA LIST LIST FILE "G:\Temp2\spssExtract.csv" /v1 to v1000 (1000A300).
EXE. That will parse it into however many string fields required. use ALTER TYPE or format each variable to begin with.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Bruce Weaver
Bruce,
This is a regular format to distribute CSV files with text strings. The double quotes are "text qualifiers", and it appears the vendors software writes any string variables with this qualifier regardless of whether a string has a comma within it or not. This is to allow you to write text strings within csv files that have a comma. As Mike noted, SPSS does not like the extra space between the comma and the next string when using the import wizard (which is indeed not-normal for a csv specification). Replacing [, "] with [,"] makes SPSS's import wizard work just fine (as far as I can tell with the limited set of info from the file). You can either use a command line tool (like sed, http://gnuwin32.sourceforge.net/packages/sed.htm) or have SPSS do this itself (and then resave and re-import the file). Mike, If the data always comes like this we can probably help you to write syntax to clean it up however you want it. If the data comes in inconsistent ways though that is something you need to talk to your vendor about (i.e. we could provide a solution for this week, but if it is inconsistent it might not work next week). I've had similar problems like this before, and it is just typically a vendor who has some automated script write out the files (and they have typically have little clue about data management itself and what a useable file needs to look like). At least though if it is consistent (whether bad or not) you can write up the clean up syntax in SPSS and not have to worry about it. It sounds like you have been going through the GUI whenever you get a new file. This is silly, it isn't much work to write a syntax file to do this for you. Andy |
Andy, 1. I'd like to know how to have SPSS do the work. I was unable to get the import to work properly without doing the external editing. 2. I don't normally have to do any work with the GUI to import the files created by my vendor. This is a temporary fix while I'm waiting for them to resolve the issues with the .sav file that caused me to turn to the syntax and flat data file. However, sometimes it is better to do some work on the syntax file to improve the labeling etc. I haven't had to do that for a while, but in the past the CSV file format hasn't caused problems. Thanks Mike |
The wizard tries to be smart about what
to use as delimiters, and in this case suggests that maybe both commas
and spaces are delimiters. You can deselect space as a delimiter, and it
should then read the file correctly. However, I still had issues reading
the one line sample. I'm not sure what the cause of that problem is.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Mike Pritchard <[hidden email]> To: [hidden email], Date: 04/29/2013 02:32 PM Subject: Re: Problems with syntax to import from CSV file - delimiters and qualifiers Sent by: "SPSSX(r) Discussion" <[hidden email]> Andy W wrote > ... You can either use a command line tool (like sed, > http://gnuwin32.sourceforge.net/packages/sed.htm) or * > have SPSS do this itself * > (and then resave and re-import the file). > .... > It sounds like you have been going through the GUI whenever you get a new > file. This is silly, it isn't much work to write a syntax file to do this > for you. > > Andy Andy, 1. I'd like to know how to have SPSS do the work. I was unable to get the import to work properly without doing the external editing. 2. I don't normally have to do any work with the GUI to import the files created by my vendor. This is a temporary fix while I'm waiting for them to resolve the issues with the .sav file that caused me to turn to the syntax and flat data file. However, sometimes it is better to do some work on the syntax file to improve the labeling etc. I haven't had to do that for a while, but in the past the CSV file format hasn't caused problems. Thanks Mike -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problems-with-syntax-to-import-from-CSV-file-delimiters-and-qualifiers-tp5719777p5719828.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by David Marso
Here is some relevant documentation David's suggested use of DATA LIST with no delimiters specified. (In the FM, see DATA LIST > Operations > Freefield Data.)
For freefield data without explicitly specified value delimiters: Commas and blanks are interpreted as delimiters between values. Extra blanks are ignored. Multiple commas with or without blank space between them can be used to specify missing data. If a valid value contains commas or blank spaces, enclose the values in quotes. For data with explicitly specified value delimiters (for example,DATA LIST FREE (",")): Multiple delimiters without any intervening space can be used to specify missing data. The specified delimiters cannot occur within a data value, even if you enclose the value in quotes. Note: Freefield format with specified value delimiters is typically used to read data in text format written by a computer program, not for data manually entered in a text editor.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rick Oliver-3
Rick, I tried deselecting the space as delimiter, and that didn’t work. I’ve just tried importing into Excel (to help as I check out alternative exports), and that didn’t work properly either with a full file – until I made the edits to change the comma space to just comma. After review the options, for this time, I think the best approach will be to create a new export, redo the labeling work, and then make any further corrections to the generated syntax file. Thanks Mike From: Rick Oliver [via SPSSX Discussion] [mailto:[hidden email]] The wizard tries to be smart about what to use as delimiters, and in this case suggests that maybe both commas and spaces are delimiters. You can deselect space as a delimiter, and it should then read the file correctly. However, I still had issues reading the one line sample. I'm not sure what the cause of that problem is.
If you reply to this email, your message will be added to the discussion below: To unsubscribe from Problems with syntax to import from CSV file - delimiters and qualifiers, click here. |
Free forum by Nabble | Edit this page |