Removing all Alpha characters from a string variable

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing all Alpha characters from a string variable

Art Kendall
There was a string variable that was supposed to hold a latitude-longitude
pair in parentheses with a comma in between. I have a workaround but I
decided to try another approach.
However, I did not correctly adapt an example for a post in the archives of
this list.
This was intended to take a string variable Location and remove all alpha
characters and put the result in a new variable. All results were $symis.

spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".

 The syntax also did not work putting the cleaned out string back in the
same variable.
spssinc trans result=Location
    /formula "re.sub('[a-zA-Z]', '', Location)".

This is some syntax that shows what happens.

* Encoding: UTF-8.
* want to extract the part inside the parentheses to 2 numeric variables.
NEW FILE.
DATASET CLOSE ALL.
data list list /Location (a50).
begin data
    "(13.02823903, -86.15392199) word word"
    "(13.02825303, -86.15263101) something"
    "word, word word(13.02826996, -86.15247301)"
    "something something(13.02831899, -86.15369601)"
    "word,word, word(13.028363, -86.15285296)"
    "(13.02838999, -86.15378503)"
    "(13.02840298, -86.15298397)"
    "(13.02842502, -86.15390296)"
    "(13.02846299, -86.15308899)"
    "(13.028464, -86.15499797)"
    "(13.02859099, -86.15357598)"
    "(13.02867598, -86.15395803)"
    "()"
    "()"
    "()"
    ", ()"
    ", ()"
    "(,)"
    "(,)"
    ""
    ""
END DATA.
*attempt at new method.
* remove all alpha characters.
String Location2 (A50).
spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".
LIST /Location /Location2/.

*   old method.
STRING Temp.Location LatPiece LonPiece(A50).
COMPUTE Temp.Location = Location.
* strip off part after CloseParen.
COMPUTE CloseParen = CHAR.INDEX(Temp.Location,")").
COMPUTE Temp.Location = CHAR.SUBSTR (Temp.Location,1,CloseParen).
*    Strip off part before Open Paren.
COMPUTE OpenParen =CHAR.INDEX(Temp.Location,"(").
IF OpenParen GT 1 Temp.Location = CHAR.SUBSTR(Temp.Location, OpenParen,50).
NUMERIC Latitude, Longitude (f12.8).
*    set value in case instance of data entry is not provided for.
COMPUTE Latitude  = -888.
COMPUTE Longitude = -888.
DO IF LENGTH(RTRIM(Temp.Location)) LT 8.
    COMPUTE Latitude  = -999.
    COMPUTE Longitude = -999.
ELSE.
    COMPUTE WhereComma = CHAR.INDEX(Temp.Location,",").
    *    strip off parens.
    COMPUTE Temp.Location = REPLACE(Temp.Location,"(","").
    COMPUTE Temp.Location = REPLACE(Temp.Location,")","").
    COMPUTE LatPiece = CHAR.SUBSTR(Temp.Location,1,WhereComma-2).
    COMPUTE LonPiece = CHAR.SUBSTR(Temp.Location,WhereComma+1,50).
    COMPUTE Latitude = NUMBER(LatPiece,F12.8).
    COMPUTE Longitude= NUMBER(LonPiece,F12.8).
END IF.
VALUE LABELS Latitude, Longitude
    888 'failure in syntax'
    999 'not entered as usable location'.
MISSING VALUES Latitude, Longitude (-999, -888).
List /Location Latitude, Longitude.








-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Removing all Alpha characters from a string variable

Jon Peck
By default, Location2 would be a numeric variable, so any nonnumeric characters such as (, ), - and comma would be left after removing the alpha characters, so the result would be sysmis.  TRANS changes the type of the result variable to numeric in this case.

This code extracts the numeric fields and returns them as two variables.  If there are fewer than two numeric values, the missing fields will be sysmis.  The expression looks for a sequence of digits, a minus sign and a dot.

spssinc trans result=lat long
/formula "re.findall(r'[0-9-.]+', Location)".




On Mon, Aug 10, 2020 at 11:49 AM Art Kendall <[hidden email]> wrote:
There was a string variable that was supposed to hold a latitude-longitude
pair in parentheses with a comma in between. I have a workaround but I
decided to try another approach.
However, I did not correctly adapt an example for a post in the archives of
this list.
This was intended to take a string variable Location and remove all alpha
characters and put the result in a new variable. All results were $symis.

spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".

 The syntax also did not work putting the cleaned out string back in the
same variable.
spssinc trans result=Location
    /formula "re.sub('[a-zA-Z]', '', Location)".

This is some syntax that shows what happens.

* Encoding: UTF-8.
* want to extract the part inside the parentheses to 2 numeric variables.
NEW FILE.
DATASET CLOSE ALL.
data list list /Location (a50).
begin data
    "(13.02823903, -86.15392199) word word"
    "(13.02825303, -86.15263101) something"
    "word, word word(13.02826996, -86.15247301)"
    "something something(13.02831899, -86.15369601)"
    "word,word, word(13.028363, -86.15285296)"
    "(13.02838999, -86.15378503)"
    "(13.02840298, -86.15298397)"
    "(13.02842502, -86.15390296)"
    "(13.02846299, -86.15308899)"
    "(13.028464, -86.15499797)"
    "(13.02859099, -86.15357598)"
    "(13.02867598, -86.15395803)"
    "()"
    "()"
    "()"
    ", ()"
    ", ()"
    "(,)"
    "(,)"
    ""
    ""
END DATA.
*attempt at new method.
* remove all alpha characters.
String Location2 (A50).
spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".
LIST /Location /Location2/.

*   old method.
STRING Temp.Location LatPiece LonPiece(A50).
COMPUTE Temp.Location = Location.
* strip off part after CloseParen.
COMPUTE CloseParen = CHAR.INDEX(Temp.Location,")").
COMPUTE Temp.Location = CHAR.SUBSTR (Temp.Location,1,CloseParen).
*    Strip off part before Open Paren.
COMPUTE OpenParen =CHAR.INDEX(Temp.Location,"(").
IF OpenParen GT 1 Temp.Location = CHAR.SUBSTR(Temp.Location, OpenParen,50).
NUMERIC Latitude, Longitude (f12.8).
*    set value in case instance of data entry is not provided for.
COMPUTE Latitude  = -888.
COMPUTE Longitude = -888.
DO IF LENGTH(RTRIM(Temp.Location)) LT 8.
    COMPUTE Latitude  = -999.
    COMPUTE Longitude = -999.
ELSE.
    COMPUTE WhereComma = CHAR.INDEX(Temp.Location,",").
    *    strip off parens.
    COMPUTE Temp.Location = REPLACE(Temp.Location,"(","").
    COMPUTE Temp.Location = REPLACE(Temp.Location,")","").
    COMPUTE LatPiece = CHAR.SUBSTR(Temp.Location,1,WhereComma-2).
    COMPUTE LonPiece = CHAR.SUBSTR(Temp.Location,WhereComma+1,50).
    COMPUTE Latitude = NUMBER(LatPiece,F12.8).
    COMPUTE Longitude= NUMBER(LonPiece,F12.8).
END IF.
VALUE LABELS Latitude, Longitude
    888 'failure in syntax'
    999 'not entered as usable location'.
MISSING VALUES Latitude, Longitude (-999, -888).
List /Location Latitude, Longitude.








-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Removing all Alpha characters from a string variable

Art Kendall
Sorry

Mistake in the top of my post. I should have been clearer.
In the example,syntax farther down it says

String Location2 (A50).
spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".

and the result was sysmis.

But WOW.  That snippet is certainly more compact than my original approach.

Thank you.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Removing all Alpha characters from a string variable

Jon Peck
Yes.  You got sysmis because SPSSINC TRANS sets the output variables' types itself.  It can change an existing variable's type as well as creating new variables of the specified type, and the default is numeric.

On Mon, Aug 10, 2020 at 2:59 PM Art Kendall <[hidden email]> wrote:
Sorry

Mistake in the top of my post. I should have been clearer.
In the example,syntax farther down it says

String Location2 (A50).
spssinc trans result=Location2
    /formula "re.sub('[a-zA-Z]', '', Location)".

and the result was sysmis.

But WOW.  That snippet is certainly more compact than my original approach.

Thank you.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD