Can python be used to apply a spell check to comments in syntax.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view

Can python be used to apply a spell check to comments in syntax.

Art Kendall
We had some postings recently on comments in syntax.

Is there perhaps already a Python example that checks the spelling in comments? 

If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python.

Art Kendall
Social Research Consultants
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view

Re: Can python be used to apply a spell check to comments in syntax.

Jon K Peck
There is no spell checker in the Python standard library, but there are a number of third-party modules that could be used to build such a tool.  I have not used any of them myself.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From:        Art Kendall <[hidden email]>
To:        [hidden email],
Date:        01/18/2014 07:34 AM
Subject:        [SPSSX-L] Can python be used to apply a spell check to comments              in syntax.
Sent by:        "SPSSX(r) Discussion" <[hidden email]>

We had some postings recently on comments in syntax.

Is there perhaps already a Python example that checks the spelling in comments?  

If not, is there a Spell Check in Python. One could write SPSS syntax that reads another syntax file and passes the string to Python.

Art Kendall
Social Research Consultants

Art Kendall
Social Research Consultants

View this message in context: Can python be used to apply a spell check to comments in syntax.
Sent from the
SPSSX Discussion mailing list archive at
Reply | Threaded
Open this post in threaded view

Re: Can python be used to apply a spell check to comments in syntax.

Albert-Jan Roskam
>Subject: Re: [SPSSX-L] Can python be used to apply a spell check to� � � � � � � � � � � � � � comments in� � � � � � � � � � � � � � syntax.

>We had some postings recently on comments in syntax.
>Is there perhaps already a Python example that checks the spelling in comments?
>If not, is there a Spell Check in Python. One could write SPSS syntax that
reads another syntax file and passes the string to Python.

I checked out 'whoosh' (for the fun of it!) and used one book to create an index. Then I gave it some text with deliberate errors; the output is below. I think that the result will improve if you give it more data to index. Not sure why 'reads' comes up. However "Currently the suggestion engine is more like a “typo corrector” than a
real “spell checker” since it doesn’t do the kind of sophisticated
phonetic matching or semantic/contextual analysis a good spell checker
might. However, it is still very useful." []

The code is here:, but I also pasted it below

Id --> Did you mean: did
iz --> Did you mean: viz
saj --> Did you mean: say
thad --> Did you mean: had
shoult --> Did you mean: should
intent --> Did you mean: infant
impress --> Did you mean: express
saves --> Did you mean: save
reads --> Did you mean: read
happier --> Did you mean: happen
variable --> Did you mean: valuable
function --> Did you mean: fiction
big --> Did you mean: beg
com --> Did you mean: come
ment --> Did you mean: men
intent --> Did you mean: infant

import os.path
import urllib
import codecs
import cStringIO as StringIO
import re
import string
from whoosh import index, qparser
from whoosh.fields import Schema, ID, TEXT
from whoosh.index import open_dir

def write_index(src_paths, dst_dir):
� � �  """Write search index"""
� � �  schema = Schema(path=ID(unique=True, stored=True),
� � � � � � � � � � � � � � � � � � �  content=TEXT(spelling=True))
� � �  ix = index.create_in(dst_dir, schema=schema)
� � �  writer = ix.writer()
� � �  for src_path in src_paths:
� � � � � � �  add_doc(writer, src_path)
� � �  writer.commit()

def strip_punctuation(s):
� � �  """strip all the punctuation from a string"""
� � �  return re.sub("[%s]*" % re.escape(string.punctuation), "", s)

def add_doc(writer, path):
� � �  """Add utf-8 encoded document to index"""
� � �  fileObj =, encoding="utf-8")
� � �  content = strip_punctuation(
� � �  fileObj.close()
� � �  for word in content.split():
� � � � � � �  writer.add_document(path=path, content=word)

def parse_string(qstring, index):
� � �  """Parse the user query string"""
� � �  parser = qparser.QueryParser("content", index.schema)
� � �  q = parser.parse(qstring)
� � �  with index.searcher() as s:� �  # Try correcting the query
� � � � � � �  corrected = s.correct_query(q, qstring)
� � � � � � �  if corrected.query != q:
� � � � � � � � � � �  print qstring, "--> Did you mean:", corrected.string

def extract_comments(fileObj):
� � �  """Extract comments from spss syntax and strip out punctuation"""
� � �  subst = strip_punctuation
� � �  comments = [subst(line).split() for line in fileObj if line[0] == u"*"]
� � �  fileObj.close()
� � �  return reduce(list.__add__, comments)

syntax = u"""\
* Id iz jajaja to saj thad names shoult reveal intent. What we want to impress upon you is that
* we are serious about this. Choosing good names takes time but saves more than it takes.
* So take care with your names and change them when you find better ones. Everyone who
* reads your code (including you) will be happier if you do.
* The name of a variable, function, or class, should answer all the big questions. It
* should tell you why it exists, what it does, and how it is used. If a name requires a com-
* ment, then the name does not reveal its intent."""

if __name__ == "__main__":
� � �  url = ""�  # Flatland
� � �  dst_dir = u"d:/temp"
� � �  book = os.path.join(dst_dir, u"book.txt")
� � �  if not os.path.exists(book):
� � � � � � �  urllib.urlretrieve (url, book)
� � � � � � �  write_index([book], dst_dir)
� � �  comments = extract_comments(StringIO.StringIO(syntax))
� � �  index = open_dir(dst_dir)�  # open index from file
� � �  for comment in comments:
� � � � � � �  parse_string(comment, index)

To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
For a list of commands to manage subscriptions, send the command