Transpose & Restructure Help Please

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Transpose & Restructure Help Please

Albert-Jan Roskam
>Actually variable names can be a bit more

general than this re suggests.  For example, accented or nonwestern
characters are valid in the first position.  Even circled Hangul characters
are valid in a name.  But there are a few reserved names:
>ALL, AND, BY, EQ, GE, GT, LE, LT,
NE, NOT, OR, TO, and WITH.
>Names are limited to 64 bytes in length.


Yes, after I hit 'send' I realized that e.g. a varName that is/startswith the Greek mu character would not be matched.

>>> import re
>>> re.match("[a-z]+", unichr(956), re.U)  # no match

I thought that the re.UNICODE flag generalized the [a-z] class to whatever is defined as a letter in the unicode database, and that the absence of such a flag was an implicit re.ASCII flag. I was convinced that it worked that way, but apparently I was wrong. Also, the regex needs to be a unicode string. Thanks for the tip. The function below works with unicode strings too, and it takes care of reserved words and lenght. But is it number of characters or number of bytes?


def isValid(varName):
    reserved = 'all and by eq ge gt le lt ne not or to with'.split()
    m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I | re.U)
    if m and len(m.group(0)) <= 64 and m.group(0) not in reserved:
        return True
    return False

>>> isValid("#scratch")
True
>>> isValid(unichr(956))
True
>>> isValid("$sysmis")
False
>>> isValid("ike&tina")
False
>>> isValid("_wrong")
False
>>> isValid("var1")
True
>>> isValid(unichr(956) + "blah123")
True
>>>

And I just learnt that with Ponyguruma I can do: \p{Greek}


def isValid(varName):

  m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I | re.U)
  return True if m and len(m.group(0)) <= 64 else False

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Transpose & Restructure Help Please

Albert-Jan Roskam
<snip>

>
> def isValid(varName):
>     reserved = 'all and by eq ge gt le lt ne not or to with'.split()
>     m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I
> | re.U)
>     if m and len(m.group(0)) <= 64 and m.group(0) not in reserved:
>         return True
>     return False

probably the number of bytes should be counted, so assuming utf-8 encoding:
    if m and len(m.group(0).encode("utf-8")) <= 64 and m.group(0) not in reserved:

>>>>  isValid("#scratch")
> True
>>>>  isValid(unichr(956))
> True
>>>>  isValid("$sysmis")
> False
>>>>  isValid("ike&tina")
> False
>>>>  isValid("_wrong")
> False
>>>>  isValid("var1")
> True
>>>>  isValid(unichr(956) + "blah123")
> True
>>>>
>
> And I just learnt that with Ponyguruma I can do: \p{Greek}
>
>
> def isValid(varName):
>
>   m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I |
> re.U)
>   return True if m and len(m.group(0)) <= 64 else False
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12