Normalize a String1

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Normalize a String1

Libardo López Guzmán
Hi dear Listers.
I would appreciate your help to obtain a Sintax SPSS v15 to normalize a
String (millions cases). Sorry for my mistake in the past post.
The problem is that i have spaces (1,2,3....), tabulators (1,2....) in
between as separators and it must be replaced with one space.
Let me show some examples. The string may have up to 12 parts.
Sara                          Smith  Dockter
   Lee   Hunter          Casidy
George Harvey   Mora   Aito
  Marck            Mack

After normalization,

Sara Smith Dockter
Lee Hunter Casidy
George Harvey Mora Aito
Marck Mack

Thanks for your help,

Libardo

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Albert-Jan Roskam
Hi,

A not-so-elegant solution is below. It assumes that the string variable with the names is the first variable in the file.
It also assumes that a directory d:/temp is present. It would be nicer to skip the step with the auxiliary txt file.

Cheers!!
Albert-Jan

* sample data.
data list free / oldvalue (a60).
begin data
'Sara                          Smith  Dockter '
'  Lee  Hunter          Casidy'
'George Harvey  Mora  Aito    '
'  Marck            Mack    '
end data.

*actual  code.
compute casenum = $casenum.
exe.
dataset name source.

begin program.
import spss, re

f = open('d:/temp/workfile.txt', 'w')
dataCursor = spss.Cursor([0])
f.write("casenum\tnewvalue\r\n")
for i in range(spss.GetCaseCount()):
    oldval = dataCursor.fetchone()[0]
    newval = re.sub(r"\s+", ' ', oldval)
    newval = newval.strip()
    writestr = str(i + 1) + "\t" + newval + "\r\n"
    f.write(writestr)
dataCursor.close()
f.close()
end program.

GET DATA  /TYPE = TXT /FILE = 'D:\temp\workfile.txt'
 /DELCASE = LINE /DELIMITERS = "\t"
 /ARRANGEMENT = DELIMITED
 /FIRSTCASE = 2 /IMPORTCASE = ALL
 /VARIABLES = casenum F1.0 newvalue A25 .

match files / file = * / file = source / by = casenum.
exe.
dataset close all.





----- Original Message ----
From: Libardo Lopez <[hidden email]>
To: [hidden email]
Sent: Saturday, January 24, 2009 2:09:47 PM
Subject: Normalize a String1

Hi dear Listers.
I would appreciate your help to obtain a Sintax SPSS v15 to normalize a
String (millions cases). Sorry for my mistake in the past post.
The problem is that i have spaces (1,2,3....), tabulators (1,2....) in
between as separators and it must be replaced with one space.
Let me show some examples. The string may have up to 12 parts.
Sara                          Smith  Dockter
   Lee   Hunter          Casidy
George Harvey   Mora   Aito
  Marck            Mack

After normalization,

Sara Smith Dockter
Lee Hunter Casidy
George Harvey Mora Aito
Marck Mack

Thanks for your help,

Libardo

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Oliver, Richard
The multiple space problem can be solved fairly easily without resorting to Python:
 
loop if index(rtrim(stringvar), "  ")>0.
compute stringvar=replace(stringvar, "  ", " ").
end loop.
 
I thought there was a way to specify tab characters in command syntax, but it isn't what I thought it was; so I don't have a simple syntax solution for the tab problem.

________________________________

From: SPSSX(r) Discussion on behalf of Albert-jan Roskam
Sent: Sat 1/24/2009 12:39 PM
To: [hidden email]
Subject: Re: Normalize a String1



Hi,

A not-so-elegant solution is below. It assumes that the string variable with the names is the first variable in the file.
It also assumes that a directory d:/temp is present. It would be nicer to skip the step with the auxiliary txt file.

Cheers!!
Albert-Jan

* sample data.
data list free / oldvalue (a60).
begin data
'Sara                          Smith  Dockter '
'  Lee  Hunter          Casidy'
'George Harvey  Mora  Aito    '
'  Marck            Mack    '
end data.

*actual  code.
compute casenum = $casenum.
exe.
dataset name source.

begin program.
import spss, re

f = open('d:/temp/workfile.txt', 'w')
dataCursor = spss.Cursor([0])
f.write("casenum\tnewvalue\r\n")
for i in range(spss.GetCaseCount()):
    oldval = dataCursor.fetchone()[0]
    newval = re.sub(r"\s+", ' ', oldval)
    newval = newval.strip()
    writestr = str(i + 1) + "\t" + newval + "\r\n"
    f.write(writestr)
dataCursor.close()
f.close()
end program.

GET DATA  /TYPE = TXT /FILE = 'D:\temp\workfile.txt'
 /DELCASE = LINE /DELIMITERS = "\t"
 /ARRANGEMENT = DELIMITED
 /FIRSTCASE = 2 /IMPORTCASE = ALL
 /VARIABLES = casenum F1.0 newvalue A25 .

match files / file = * / file = source / by = casenum.
exe.
dataset close all.





----- Original Message ----
From: Libardo Lopez <[hidden email]>
To: [hidden email]
Sent: Saturday, January 24, 2009 2:09:47 PM
Subject: Normalize a String1

Hi dear Listers.
I would appreciate your help to obtain a Sintax SPSS v15 to normalize a
String (millions cases). Sorry for my mistake in the past post.
The problem is that i have spaces (1,2,3....), tabulators (1,2....) in
between as separators and it must be replaced with one space.
Let me show some examples. The string may have up to 12 parts.
Sara                          Smith  Dockter
   Lee   Hunter          Casidy
George Harvey   Mora   Aito
  Marck            Mack

After normalization,

Sara Smith Dockter
Lee Hunter Casidy
George Harvey Mora Aito
Marck Mack

Thanks for your help,

Libardo

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Maguin, Eugene
Richard,

Are you saying that after running the syntax below, the string

'multiple     space       problem'

Will go to

'multiple space problem          '?

Thus, the replace command will pull all characters to the right of the '  '
to the left by one place.


>>>The multiple space problem can be solved fairly easily without resorting
to Python:

loop if index(rtrim(stringvar), "  ")>0.
compute stringvar=replace(stringvar, "  ", " ").
end loop.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Peck, Jon
In reply to this post by Oliver, Richard
To get rid of tabs, run something like
compute strvar = replace(strvar, string(09, pib1),' ').

Do this before replacing blanks in case there are mixed strings and blanks.

(Of course, there is a nicer way to do all this in Python with regular expressions, but I won't go there today.)

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Oliver, Richard
Sent: Saturday, January 24, 2009 12:55 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Normalize a String1

The multiple space problem can be solved fairly easily without resorting to Python:

loop if index(rtrim(stringvar), "  ")>0.
compute stringvar=replace(stringvar, "  ", " ").
end loop.

I thought there was a way to specify tab characters in command syntax, but it isn't what I thought it was; so I don't have a simple syntax solution for the tab problem.

________________________________

From: SPSSX(r) Discussion on behalf of Albert-jan Roskam
Sent: Sat 1/24/2009 12:39 PM
To: [hidden email]
Subject: Re: Normalize a String1



Hi,

A not-so-elegant solution is below. It assumes that the string variable with the names is the first variable in the file.
It also assumes that a directory d:/temp is present. It would be nicer to skip the step with the auxiliary txt file.

Cheers!!
Albert-Jan

* sample data.
data list free / oldvalue (a60).
begin data
'Sara                          Smith  Dockter '
'  Lee  Hunter          Casidy'
'George Harvey  Mora  Aito    '
'  Marck            Mack    '
end data.

*actual  code.
compute casenum = $casenum.
exe.
dataset name source.

begin program.
import spss, re

f = open('d:/temp/workfile.txt', 'w')
dataCursor = spss.Cursor([0])
f.write("casenum\tnewvalue\r\n")
for i in range(spss.GetCaseCount()):
    oldval = dataCursor.fetchone()[0]
    newval = re.sub(r"\s+", ' ', oldval)
    newval = newval.strip()
    writestr = str(i + 1) + "\t" + newval + "\r\n"
    f.write(writestr)
dataCursor.close()
f.close()
end program.

GET DATA  /TYPE = TXT /FILE = 'D:\temp\workfile.txt'
 /DELCASE = LINE /DELIMITERS = "\t"
 /ARRANGEMENT = DELIMITED
 /FIRSTCASE = 2 /IMPORTCASE = ALL
 /VARIABLES = casenum F1.0 newvalue A25 .

match files / file = * / file = source / by = casenum.
exe.
dataset close all.





----- Original Message ----
From: Libardo Lopez <[hidden email]>
To: [hidden email]
Sent: Saturday, January 24, 2009 2:09:47 PM
Subject: Normalize a String1

Hi dear Listers.
I would appreciate your help to obtain a Sintax SPSS v15 to normalize a
String (millions cases). Sorry for my mistake in the past post.
The problem is that i have spaces (1,2,3....), tabulators (1,2....) in
between as separators and it must be replaced with one space.
Let me show some examples. The string may have up to 12 parts.
Sara                          Smith  Dockter
   Lee   Hunter          Casidy
George Harvey   Mora   Aito
  Marck            Mack

After normalization,

Sara Smith Dockter
Lee Hunter Casidy
George Harvey Mora Aito
Marck Mack

Thanks for your help,

Libardo

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Oliver, Richard
In reply to this post by Maguin, Eugene
It will keep going through the string value, replacing all instances of two spaces with one space until there are no more instances of two consecutive spaces.

________________________________

From: SPSSX(r) Discussion on behalf of Gene Maguin
Sent: Sat 1/24/2009 2:16 PM
To: [hidden email]
Subject: Re: Normalize a String1



Richard,

Are you saying that after running the syntax below, the string

'multiple     space       problem'

Will go to

'multiple space problem          '?

Thus, the replace command will pull all characters to the right of the '  '
to the left by one place.


>>>The multiple space problem can be solved fairly easily without resorting
to Python:

loop if index(rtrim(stringvar), "  ")>0.
compute stringvar=replace(stringvar, "  ", " ").
end loop.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Normalize a String1

Libardo López Guzmán
Thanks so much to all of you.

With your help, the syntax to solve my needs is:


Compute oldvalue = ltrim(oldvalue).

*To get rid of tabs.

compute oldvalue = replace(oldvalue, string(09, pib1),' ').

loop if index(rtrim(oldvalue), "  ")>0.
compute oldvalue=replace(oldvalue, "  ", " ").
end loop.
execute.

Tanks again,

Libardo

On Sat, Jan 24, 2009 at 5:53 PM, Oliver, Richard <[hidden email]> wrote:

> It will keep going through the string value, replacing all instances of two
> spaces with one space until there are no more instances of two consecutive
> spaces.
>
> ________________________________
>
> From: SPSSX(r) Discussion on behalf of Gene Maguin
> Sent: Sat 1/24/2009 2:16 PM
> To: [hidden email]
> Subject: Re: Normalize a String1
>
>
>
> Richard,
>
> Are you saying that after running the syntax below, the string
>
> 'multiple     space       problem'
>
> Will go to
>
> 'multiple space problem          '?
>
> Thus, the replace command will pull all characters to the right of the '  '
> to the left by one place.
>
>
> >>>The multiple space problem can be solved fairly easily without resorting
> to Python:
>
> loop if index(rtrim(stringvar), "  ")>0.
> compute stringvar=replace(stringvar, "  ", " ").
> end loop.
>
> Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD