Can python be used to apply a spell check to comments in syntax.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Can python be used to apply a spell check to comments in syntax.

Art Kendall
We had some postings recently on comments in syntax.

Is there perhaps already a Python example that checks the spelling in comments? 

If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python.


-- 
Art Kendall
Social Research Consultants
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Can python be used to apply a spell check to comments in syntax.

Jon K Peck
There is no spell checker in the Python standard library, but there are a number of third-party modules that could be used to build such a tool.  I have not used any of them myself.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email],
Date:        01/18/2014 07:34 AM
Subject:        [SPSSX-L] Can python be used to apply a spell check to comments              in syntax.
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




We had some postings recently on comments in syntax.

Is there perhaps already a Python example that checks the spelling in comments?  

If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python.


--
Art Kendall
Social Research Consultants

Art Kendall
Social Research Consultants



View this message in context: Can python be used to apply a spell check to comments in syntax.
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Can python be used to apply a spell check to comments in syntax.

Albert-Jan Roskam
>Subject: Re: [SPSSX-L] Can python be used to apply a spell check to� � � � � � � � � � � � � � comments in� � � � � � � � � � � � � � syntax.

>We had some postings recently on comments in syntax.
>
>Is there perhaps already a Python example that checks the spelling in comments?
>
>If not, is there a Spell Check in Python. One could write SPSS syntax that
reads another syntax file and passes the string to Python.

I checked out 'whoosh' (for the fun of it!) and used one book to create an index. Then I gave it some text with deliberate errors; the output is below. I think that the result will improve if you give it more data to index. Not sure why 'reads' comes up. However "Currently the suggestion engine is more like a “typo corrector” than a
real “spell checker” since it doesn’t do the kind of sophisticated
phonetic matching or semantic/contextual analysis a good spell checker
might. However, it is still very useful." [http://pythonhosted.org/Whoosh/spelling.html]

The code is here: http://pastebin.com/k4ACrNgy, but I also pasted it below

Id --> Did you mean: did
iz --> Did you mean: viz
saj --> Did you mean: say
thad --> Did you mean: had
shoult --> Did you mean: should
intent --> Did you mean: infant
impress --> Did you mean: express
saves --> Did you mean: save
reads --> Did you mean: read
happier --> Did you mean: happen
variable --> Did you mean: valuable
function --> Did you mean: fiction
big --> Did you mean: beg
com --> Did you mean: come
ment --> Did you mean: men
intent --> Did you mean: infant

import os.path
import urllib
import codecs
import cStringIO as StringIO
import re
import string
from whoosh import index, qparser
from whoosh.fields import Schema, ID, TEXT
from whoosh.index import open_dir

def write_index(src_paths, dst_dir):
� � �  """Write search index"""
� � �  schema = Schema(path=ID(unique=True, stored=True),
� � � � � � � � � � � � � � � � � � �  content=TEXT(spelling=True))
� � �  ix = index.create_in(dst_dir, schema=schema)
� � �  writer = ix.writer()
� � �  for src_path in src_paths:
� � � � � � �  add_doc(writer, src_path)
� � �  writer.commit()

def strip_punctuation(s):
� � �  """strip all the punctuation from a string"""
� � �  return re.sub("[%s]*" % re.escape(string.punctuation), "", s)

def add_doc(writer, path):
� � �  """Add utf-8 encoded document to index"""
� � �  fileObj = codecs.open(path, encoding="utf-8")
� � �  content = strip_punctuation(fileObj.read())
� � �  fileObj.close()
� � �  for word in content.split():
� � � � � � �  writer.add_document(path=path, content=word)

def parse_string(qstring, index):
� � �  """Parse the user query string"""
� � �  parser = qparser.QueryParser("content", index.schema)
� � �  q = parser.parse(qstring)
� � �  with index.searcher() as s:� �  # Try correcting the query
� � � � � � �  corrected = s.correct_query(q, qstring)
� � � � � � �  if corrected.query != q:
� � � � � � � � � � �  print qstring, "--> Did you mean:", corrected.string

def extract_comments(fileObj):
� � �  """Extract comments from spss syntax and strip out punctuation"""
� � �  subst = strip_punctuation
� � �  comments = [subst(line).split() for line in fileObj if line[0] == u"*"]
� � �  fileObj.close()
� � �  return reduce(list.__add__, comments)

syntax = u"""\
* Id iz jajaja to saj thad names shoult reveal intent. What we want to impress upon you is that
* we are serious about this. Choosing good names takes time but saves more than it takes.
* So take care with your names and change them when you find better ones. Everyone who
* reads your code (including you) will be happier if you do.
* The name of a variable, function, or class, should answer all the big questions. It
* should tell you why it exists, what it does, and how it is used. If a name requires a com-
* ment, then the name does not reveal its intent."""


if __name__ == "__main__":
� � �  url = "http://www.gutenberg.org/ebooks/97.txt.utf-8"�  # Flatland
� � �  dst_dir = u"d:/temp"
� � �  book = os.path.join(dst_dir, u"book.txt")
� � �  if not os.path.exists(book):
� � � � � � �  urllib.urlretrieve (url, book)
� � � � � � �  write_index([book], dst_dir)
� � �  comments = extract_comments(StringIO.StringIO(syntax))
� � �  index = open_dir(dst_dir)�  # open index from file
� � �  for comment in comments:
� � � � � � �  parse_string(comment, index)

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD