>Actually variable names can be a bit more
general than this re suggests. For example, accented or nonwestern characters are valid in the first position. Even circled Hangul characters are valid in a name. But there are a few reserved names: >ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and WITH. >Names are limited to 64 bytes in length. Yes, after I hit 'send' I realized that e.g. a varName that is/startswith the Greek mu character would not be matched. >>> import re >>> re.match("[a-z]+", unichr(956), re.U) # no match I thought that the re.UNICODE flag generalized the [a-z] class to whatever is defined as a letter in the unicode database, and that the absence of such a flag was an implicit re.ASCII flag. I was convinced that it worked that way, but apparently I was wrong. Also, the regex needs to be a unicode string. Thanks for the tip. The function below works with unicode strings too, and it takes care of reserved words and lenght. But is it number of characters or number of bytes? def isValid(varName): reserved = 'all and by eq ge gt le lt ne not or to with'.split() m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I | re.U) if m and len(m.group(0)) <= 64 and m.group(0) not in reserved: return True return False >>> isValid("#scratch") True >>> isValid(unichr(956)) True >>> isValid("$sysmis") False >>> isValid("ike&tina") False >>> isValid("_wrong") False >>> isValid("var1") True >>> isValid(unichr(956) + "blah123") True >>> And I just learnt that with Ponyguruma I can do: \p{Greek} def isValid(varName): m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I | re.U) return True if m and len(m.group(0)) <= 64 else False ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
<snip>
> > def isValid(varName): > reserved = 'all and by eq ge gt le lt ne not or to with'.split() > m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I > | re.U) > if m and len(m.group(0)) <= 64 and m.group(0) not in reserved: > return True > return False probably the number of bytes should be counted, so assuming utf-8 encoding: if m and len(m.group(0).encode("utf-8")) <= 64 and m.group(0) not in reserved: >>>> isValid("#scratch") > True >>>> isValid(unichr(956)) > True >>>> isValid("$sysmis") > False >>>> isValid("ike&tina") > False >>>> isValid("_wrong") > False >>>> isValid("var1") > True >>>> isValid(unichr(956) + "blah123") > True >>>> > > And I just learnt that with Ponyguruma I can do: \p{Greek} > > > def isValid(varName): > > m = re.match(ur"^#?[^\W\s\d_]+\w*$", varName, re.I | > re.U) > return True if m and len(m.group(0)) <= 64 else False > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |