We had some postings recently on comments in syntax.
Is there perhaps already a Python example that checks the spelling in comments? If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python. -- Art Kendall Social Research Consultants
Art Kendall
Social Research Consultants |
There is no spell checker in the Python
standard library, but there are a number of third-party modules that could
be used to build such a tool. I have not used any of them myself.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email], Date: 01/18/2014 07:34 AM Subject: [SPSSX-L] Can python be used to apply a spell check to comments in syntax. Sent by: "SPSSX(r) Discussion" <[hidden email]> We had some postings recently on comments in syntax. Is there perhaps already a Python example that checks the spelling in comments? If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python. -- Art Kendall Social Research Consultants Art Kendall View this message in context: Can python be used to apply a spell check to comments in syntax. Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
>Subject: Re: [SPSSX-L] Can python be used to apply a spell check to� � � � � � � � � � � � � � comments in� � � � � � � � � � � � � � syntax.
>We had some postings recently on comments in syntax. > >Is there perhaps already a Python example that checks the spelling in comments? > >If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python. I checked out 'whoosh' (for the fun of it!) and used one book to create an index. Then I gave it some text with deliberate errors; the output is below. I think that the result will improve if you give it more data to index. Not sure why 'reads' comes up. However "Currently the suggestion engine is more like a “typo corrector” than a real “spell checker” since it doesn’t do the kind of sophisticated phonetic matching or semantic/contextual analysis a good spell checker might. However, it is still very useful." [http://pythonhosted.org/Whoosh/spelling.html] The code is here: http://pastebin.com/k4ACrNgy, but I also pasted it below Id --> Did you mean: did iz --> Did you mean: viz saj --> Did you mean: say thad --> Did you mean: had shoult --> Did you mean: should intent --> Did you mean: infant impress --> Did you mean: express saves --> Did you mean: save reads --> Did you mean: read happier --> Did you mean: happen variable --> Did you mean: valuable function --> Did you mean: fiction big --> Did you mean: beg com --> Did you mean: come ment --> Did you mean: men intent --> Did you mean: infant import os.path import urllib import codecs import cStringIO as StringIO import re import string from whoosh import index, qparser from whoosh.fields import Schema, ID, TEXT from whoosh.index import open_dir def write_index(src_paths, dst_dir): � � � """Write search index""" � � � schema = Schema(path=ID(unique=True, stored=True), � � � � � � � � � � � � � � � � � � � content=TEXT(spelling=True)) � � � ix = index.create_in(dst_dir, schema=schema) � � � writer = ix.writer() � � � for src_path in src_paths: � � � � � � � add_doc(writer, src_path) � � � writer.commit() def strip_punctuation(s): � � � """strip all the punctuation from a string""" � � � return re.sub("[%s]*" % re.escape(string.punctuation), "", s) def add_doc(writer, path): � � � """Add utf-8 encoded document to index""" � � � fileObj = codecs.open(path, encoding="utf-8") � � � content = strip_punctuation(fileObj.read()) � � � fileObj.close() � � � for word in content.split(): � � � � � � � writer.add_document(path=path, content=word) def parse_string(qstring, index): � � � """Parse the user query string""" � � � parser = qparser.QueryParser("content", index.schema) � � � q = parser.parse(qstring) � � � with index.searcher() as s:� � # Try correcting the query � � � � � � � corrected = s.correct_query(q, qstring) � � � � � � � if corrected.query != q: � � � � � � � � � � � print qstring, "--> Did you mean:", corrected.string def extract_comments(fileObj): � � � """Extract comments from spss syntax and strip out punctuation""" � � � subst = strip_punctuation � � � comments = [subst(line).split() for line in fileObj if line[0] == u"*"] � � � fileObj.close() � � � return reduce(list.__add__, comments) syntax = u"""\ * Id iz jajaja to saj thad names shoult reveal intent. What we want to impress upon you is that * we are serious about this. Choosing good names takes time but saves more than it takes. * So take care with your names and change them when you find better ones. Everyone who * reads your code (including you) will be happier if you do. * The name of a variable, function, or class, should answer all the big questions. It * should tell you why it exists, what it does, and how it is used. If a name requires a com- * ment, then the name does not reveal its intent.""" if __name__ == "__main__": � � � url = "http://www.gutenberg.org/ebooks/97.txt.utf-8"� # Flatland � � � dst_dir = u"d:/temp" � � � book = os.path.join(dst_dir, u"book.txt") � � � if not os.path.exists(book): � � � � � � � urllib.urlretrieve (url, book) � � � � � � � write_index([book], dst_dir) � � � comments = extract_comments(StringIO.StringIO(syntax)) � � � index = open_dir(dst_dir)� # open index from file � � � for comment in comments: � � � � � � � parse_string(comment, index) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |