I have a data file with several string columns. When I export using the
save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Look at the Replace function in the String function section of the Universals chapter of the syntax reference.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Wright Sent: Thursday, May 29, 2014 2:09 PM To: [hidden email] Subject: save translate pipe delimited leaves embedded space on nulls I have a data file with several string columns. When I export using the save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by wsu_wright
I suppose by nulls you mean completely
blank string variables. I don't know of any way to control that with
SAVE TRANSLATE other than by postprocessing the file. You could read
the generated text file lines as a single string variable, and use the
replace function to change the | | fields to ||. WRITE could put
out a text file, but it would be fixed width output.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Wright <[hidden email]> To: [hidden email], Date: 05/29/2014 12:09 PM Subject: [SPSSX-L] save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> I have a data file with several string columns. When I export using the save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by wsu_wright
The issue is unrelated to the pipe delimiter.
This occurs with any delimiter, and it occurs for both null strings and
missing numeric values.
I don't know why the blank space is being inserted. There is nothing in the SAVE TRANSLATE syntax that will remove it. Under what circumstances does this present a problem? Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: David Wright <[hidden email]> To: [hidden email], Date: 05/29/2014 01:13 PM Subject: save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> I have a data file with several string columns. When I export using the save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
The results with WRITE would produce even
more blank spaces, since null strings would be represented by a number
of blank spaces equal to the defined width of the string variable.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Jon K Peck/Chicago/IBM@IBMUS To: [hidden email], Date: 05/29/2014 01:24 PM Subject: Re: save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> I suppose by nulls you mean completely blank string variables. I don't know of any way to control that with SAVE TRANSLATE other than by postprocessing the file. You could read the generated text file lines as a single string variable, and use the replace function to change the | | fields to ||. WRITE could put out a text file, but it would be fixed width output. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Wright <[hidden email]> To: [hidden email], Date: 05/29/2014 12:09 PM Subject: [SPSSX-L] save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> I have a data file with several string columns. When I export using the save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
This post was updated on .
I suspect Jon meant to read as one long string.
Use REPLACE and then use WRITE to pop out a single string variable. There would only be trailing blanks at the end of the record. Again it would be good to know why the unaltered file presents any issue.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Rick Oliver-3
Yeah, that is a sub-optimal way. To back up a bit, and to be sure I understand a word, is ‘null’ being used to refer to an A1 variable whose value is ‘ ‘, i.e.,
a blank character? If, and other than not wanting them in the record, isn’t ‘ ‘ a valid value unless missing values of ‘ ‘ have been defined for that variable?
David, perhaps this is what you meant by your first line.
Build a string variable concantenating the values and then use replace to get rid of blanks. And, depending on how many blanks there, maybe use Alter type to
trim off the excess blanks at the end. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Rick Oliver The results with WRITE would produce even more blank spaces, since null strings would be represented by a number of blank spaces equal to the defined width of the string variable.
|
Administrator
|
I presume I be the David in question?
SAVE TRANSLATE OUTFILE 'somefile' ....... DATA LIST FIXED FILE='somefile' /junk (A8000). COMPUTE junk=REPLACE(junk,"| |","||"). ALTER TYPE junk (AMIN). WRITE OUTFILE 'someotherfile maybe contentions' / junk (A8000). EXECUTE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Use SAVE TRANSLATE again instead of WRITE
and you won't have to bother with ALTER TYPE or worry about trailing blanks.
Also I think that ALTER TYPE is wasted if you're just going to redefined the width as A8000 again on WRITE. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: David Marso <[hidden email]> To: [hidden email], Date: 05/29/2014 02:37 PM Subject: Re: save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> I presume I be the David in question? SAVE TRANSLATE OUTFILE 'somefile' ....... DATA LIST FIXED FILE='somefile' /junk (A8000). COMPUTE junk=REPLACE(junk,"| |","||"). ALTER TYPE junk (AMIN). WRITE OUTFILE 'someotherfile maybe contentions' / junk (A8000). EXECUTE. Maguin, Eugene wrote > Yeah, that is a sub-optimal way. To back up a bit, and to be sure I > understand a word, is 'null' being used to refer to an A1 variable whose > value is ' ', i.e., a blank character? If, and other than not wanting them > in the record, isn't ' ' a valid value unless missing values of ' ' have > been defined for that variable? > > David, perhaps this is what you meant by your first line. > Build a string variable concantenating the values and then use replace to > get rid of blanks. And, depending on how many blanks there, maybe use > Alter type to trim off the excess blanks at the end. > Gene Maguin > > From: SPSSX(r) Discussion [mailto: > SPSSX-L@.UGA > ] On Behalf Of Rick Oliver > Sent: Thursday, May 29, 2014 2:31 PM > To: > SPSSX-L@.UGA > Subject: Re: save translate pipe delimited leaves embedded space on nulls > > The results with WRITE would produce even more blank spaces, since null > strings would be represented by a number of blank spaces equal to the > defined width of the string variable. > > Rick Oliver > Senior Information Developer > IBM Business Analytics (SPSS) > E-mail: > oliverr@.ibm > > > > From: Jon K Peck/Chicago/IBM@IBMUS > To: > SPSSX-L@.uga > , > Date: 05/29/2014 01:24 PM > Subject: Re: save translate pipe delimited leaves embedded space on > nulls > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > ________________________________ > > > > I suppose by nulls you mean completely blank string variables. I don't > know of any way to control that with SAVE TRANSLATE other than by > postprocessing the file. You could read the generated text file lines as > a single string variable, and use the replace function to change the | | > fields to ||. WRITE could put out a text file, but it would be fixed > width output. > > > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > phone: 720-342-5621 > > > > > From: David Wright < > coxspss@ > > > To: > SPSSX-L@.uga > , > Date: 05/29/2014 12:09 PM > Subject: [SPSSX-L] save translate pipe delimited leaves embedded > space on nulls > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > ________________________________ > > > > I have a data file with several string columns. When I export using the > save translate to use a pipe-delimiter for a text file, nulls have a > space between the delimiters. Is there a way to eliminate the embedded > blanks. No python please. > > listed below is the syntax & below that is a mocked up example of the > text file. > > Thanks in advance. > > David, > > > > SAVE TRANSLATE OUTFILE='A:\filename.txt' > /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES > /CELLS=VALUES /textoptions delimiter='|'. > > > > data|data| |data| | |data > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/save-translate-pipe-delimited-leaves-embedded-space-on-nulls-tp5726276p5726282.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by wsu_wright
We are uploading the file to a vendor from our oracle database, they requested a pipe delimited file, it is being used as part of a phone campaign. We have uploaded the file to the vendor and at present have not heard whether it is a 'problem' on their end. Personally, I can't see it as a problem since a null (e.g. in a string based phone number column) would result in no action by the vendor on an affected record, they would merely move to the next record for processing. But I wanted to be prepared in the case we were asked why the embedded space is inserted and, if it was a problem, if we could remove it via the syntax etl that processes the upload file. Thanks. On Thu, May 29, 2014 at 1:30 PM, Rick Oliver wrote: The issue is unrelated to the pipe delimiter. This occurs with any delimiter, and it occurs for both null strings and missing numeric values. I don't know why the blank space is being inserted. There is nothing in the SAVE TRANSLATE syntax that will remove it. Under what circumstances does this present a problem? Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: David Wright <[hidden email]> To: [hidden email], Date: 05/29/2014 01:13 PM Subject: save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> ___________________________________ I have a data file with several string columns. When I export using the save translate to use a pipe-delimiter for a text file, nulls have a space between the delimiters. Is there a way to eliminate the embedded blanks. No python please. listed below is the syntax & below that is a mocked up example of the text file. Thanks in advance. David, SAVE TRANSLATE OUTFILE='A:\filename.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. data|data| |data| | |data ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
If all you are doing is replacing " |" with "|" then a simple command line tool will work. Here I use HOST (in SPSS) calling sed.
*****************************************. FILE HANDLE save /NAME = "C:\Users\andrew.wheeler\Desktop\sed_test". DATA LIST FREE / X1 TO X4 (4A4). BEGIN DATA A B "" D "" "B |" "" "" B "" "" "" D B "" "" "" "" D D END DATA. SAVE TRANSLATE OUTFILE='save\orig.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. HOST COMMAND=['cd "C:\Users\andrew.wheeler\Desktop\sed_test"' 'sed "s/ |/|/g" orig.txt>update.txt']. *****************************************. Here I placed an example where this approach can be annoying, in the second record there is " |" actually in the string field, so the space is condensed. I imagine you could concoct a more complicated regular expression to make sure substitution within quotes does not happen, but it may be harmless depending on your application. The original spaces should not be a big problem, but it may save some disk space for really big files with alot of missing strings. I don't know the performance of this for big files versus reading the data back into SPSS. For the above set of 5 cases a million times over the HOST command worked pretty fast on my machine (in a second or two). |
works as long as you have a shell/os that
includes sed, in which case you probably have awk and other text processing
tools - even Python. In this case, though, they do no more than the
Statistics replace function as suggested yesterday.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Andy W <[hidden email]> To: [hidden email], Date: 05/30/2014 06:49 AM Subject: Re: [SPSSX-L] save translate pipe delimited leaves embedded space on nulls Sent by: "SPSSX(r) Discussion" <[hidden email]> If all you are doing is replacing " |" with "|" then a simple command line tool will work. Here I use HOST (in SPSS) calling sed. *****************************************. FILE HANDLE save /NAME = "C:\Users\andrew.wheeler\Desktop\sed_test". DATA LIST FREE / X1 TO X4 (4A4). BEGIN DATA A B "" D "" "B |" "" "" B "" "" "" D B "" "" "" "" D D END DATA. SAVE TRANSLATE OUTFILE='save\orig.txt' /TYPE=CSV /ENCODING='Locale' /MAP /REPLACE /FIELDNAMES /CELLS=VALUES /textoptions delimiter='|'. HOST COMMAND=['cd "C:\Users\andrew.wheeler\Desktop\sed_test"' 'sed "s/ |/|/g" orig.txt>update.txt']. *****************************************. Here I placed an example where this approach can be annoying, in the second record there is " |" actually in the string field, so the space is condensed. I imagine you could concoct a more complicated regular expression to make sure substitution within quotes does not happen, but it may be harmless depending on your application. The original spaces should not be a big problem, but it may save some disk space for really big files with alot of missing strings. I don't know the performance of this for big files versus reading the data back into SPSS. For the above set of 5 cases a million times over the HOST command worked pretty fast on my machine (in a second or two). ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/save-translate-pipe-delimited-leaves-embedded-space-on-nulls-tp5726276p5726288.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Andy W
Even though I am an old DOS user, I was not familiar with SED. So I opened a command window and looked for the help on it. But here's what I found:
C:\bw>sed /? 'sed' is not recognized as an internal or external command, operable program or batch file. Andy, is there a native DOS version of SED on your system? Or did you have to find and install a version for Windows? I see there are such things available--e.g., this one from SourceForge: http://gnuwin32.sourceforge.net/packages/sed.htm Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I'm on Windows and yes Bruce, I installed the version you linked to (along with the other tools Jon mentioned).
Yes, I agree Jon they do the same thing. Using the command line tools the process is: 1 - Save File 2 - Process File - find & replace (and pipe to a new file at the same time) The workflows previously posted were 1 - Save File 2 - Reread file into SPSS as one long string 3 - Process File - find & replace 4 - Save File Like I said, I tested the program with 5 million cases, and the sed command I posted worked in about 1~2 seconds. I'm just posting as another potential alternative that I find attractive. |
Free forum by Nabble | Edit this page |