Need Python help

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Need Python help

Cathie Atkinson-2

I'm in the process of changing my programs to process Excel 2010 files and I've hit a glitch with python.  I have v19 and this is the syntax that's been working with .xls files:

BEGIN PROGRAM.
from xlrd import *
from xlwt import *
import spss
inbook  = open_workbook("L:\Shared\COM\Stress.Health\Public\FOOD\FOOD data\Metabolic Data/INF06.1601.v1.xls")
isheets = inbook.nsheets
outbook = Workbook()
osheets = outbook.add_sheet("merged", cell_overwrite_ok=True)
out_file= "c:/filesx/mergedtest.xls"
rowcnt  = -1
for shx in range(isheets):
       sh = inbook.sheet_by_index(shx)
       for rx in range(sh.nrows):
           rowcnt += 1
           for cx in range(sh.ncols):
               if sh.row_types(rx)[cx] and sh.row_values(rx)[cx]:
                   if shx == 0:
                       osheets.write(rx, cx, sh.cell_value(rx, cx))
                       osheets.write(rx, sh.ncols, inbook.sheet_names()[shx])
                   else:
                       osheets.write(rowcnt, cx, sh.cell_value(rx, cx))
                       osheets.write(rowcnt, sh.ncols, inbook.sheet_names()[shx])
outbook.save(out_file)

spss.Submit("""get data /type=xls /file= '%s' /sheet=name 'merged' """ % out_file + \
           """/cellrange=full /readnames=on /assumedstrwidth=32767.""" )
spss.Submit("save outfile = '%s'." % (out_file[:-4] + ".sav"))
END PROGRAM.

when I change .xls to .xlsx I get this error:
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'

I've been searching online for a solution, but all I found was a 2011 post saying python didn't support Excel 2010.  I'm assuming that's no longer the case, but I'm guessing I need to update something?  I poked around Developer Central, but it's not clear to me what I need to do so I'm hoping someone will be able to guide me!

Thanks so much!
Cathie



Reply | Threaded
Open this post in threaded view
|

Re: Need Python help

Jon K Peck
I don't know whether xlrd/xlwt actually support Excel 2010.  They tend to be behind, but xlrd version 0.9.2 from April, 2013 is the most recent release of xlrd.  If you have not already installed that version, that would be the first thing to try.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Cathie Atkinson <[hidden email]>
To:        [hidden email],
Date:        07/25/2013 01:00 PM
Subject:        [SPSSX-L] Need Python help
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





I'm in the process of changing my programs to process Excel 2010 files and I've hit a glitch with python.  I have v19 and this is the syntax that's been working with .xls files:

BEGIN PROGRAM.
from xlrd import *
from xlwt import *
import spss
inbook  = open_workbook("L:\Shared\COM\Stress.Health\Public\FOOD\FOOD data\Metabolic Data/INF06.1601.v1.xls")
isheets = inbook.nsheets
outbook = Workbook()
osheets = outbook.add_sheet("merged", cell_overwrite_ok=True)
out_file= "c:/filesx/mergedtest.xls"
rowcnt  = -1
for shx in range(isheets):
      sh = inbook.sheet_by_index(shx)
      for rx in range(sh.nrows):
          rowcnt += 1
          for cx in range(sh.ncols):
              if sh.row_types(rx)[cx] and sh.row_values(rx)[cx]:
                  if shx == 0:
                      osheets.write(rx, cx, sh.cell_value(rx, cx))
                      osheets.write(rx, sh.ncols, inbook.sheet_names()[shx])
                  else:
                      osheets.write(rowcnt, cx, sh.cell_value(rx, cx))
                      osheets.write(rowcnt, sh.ncols, inbook.sheet_names()[shx])
outbook.save(out_file)

spss.Submit("""get data /type=xls /file= '%s' /sheet=name 'merged' """ % out_file + \
          """/cellrange=full /readnames=on /assumedstrwidth=32767.""" )
spss.Submit("save outfile = '%s'." % (out_file[:-4] + ".sav"))
END PROGRAM.


when I change .xls to .xlsx I get this error:
Traceback (most recent call last):
 File "<string>", line 5, in <module>
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
   biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
   bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
   raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'


I've been searching online for a solution, but all I found was a 2011 post saying python didn't support Excel 2010.  I'm assuming that's no longer the case, but I'm guessing I need to update something?  I poked around Developer Central, but it's not clear to me what I need to do so I'm hoping someone will be able to guide me!

Thanks so much!
Cathie



Reply | Threaded
Open this post in threaded view
|

Re: Need Python help

Albert-Jan Roskam
Check out [hidden email] <[hidden email]>.
The maintainers/developers of xlrd/xlwt are very active on this list.

I believe that xlrd and xlwt now support xlsx. Not sure about all the different BIFF versions of .xls though.

 
Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Thursday, July 25, 2013 9:50 PM
Subject: Re: [SPSSX-L] Need Python help

I don't know whether xlrd/xlwt actually support Excel 2010.  They tend to be behind, but xlrd version 0.9.2 from April, 2013 is the most recent release of xlrd.  If you have not already installed that version, that would be the first thing to try.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Cathie Atkinson <[hidden email]>
To:        [hidden email],
Date:        07/25/2013 01:00 PM
Subject:        [SPSSX-L] Need Python help
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





I'm in the process of changing my programs to process Excel 2010 files and I've hit a glitch with python.  I have v19 and this is the syntax that's been working with .xls files:

BEGIN PROGRAM.
from xlrd import *
from xlwt import *
import spss
inbook  = open_workbook("L:\Shared\COM\Stress.Health\Public\FOOD\FOOD data\Metabolic Data/INF06.1601.v1.xls")
isheets = inbook.nsheets
outbook = Workbook()
osheets = outbook.add_sheet("merged", cell_overwrite_ok=True)
out_file= "c:/filesx/mergedtest.xls"
rowcnt  = -1
for shx in range(isheets):
      sh = inbook.sheet_by_index(shx)
      for rx in range(sh.nrows):
          rowcnt += 1
          for cx in range(sh.ncols):
              if sh.row_types(rx)[cx] and sh.row_values(rx)[cx]:
                  if shx == 0:
                      osheets.write(rx, cx, sh.cell_value(rx, cx))
                      osheets.write(rx, sh.ncols, inbook.sheet_names()[shx])
                  else:
                      osheets.write(rowcnt, cx, sh.cell_value(rx, cx))
                      osheets.write(rowcnt, sh.ncols, inbook.sheet_names()[shx])
outbook.save(out_file)

spss.Submit("""get data /type=xls /file= '%s' /sheet=name 'merged' """ % out_file + \
          """/cellrange=full /readnames=on /assumedstrwidth=32767.""" )
spss.Submit("save outfile = '%s'." % (out_file[:-4] + ".sav"))
END PROGRAM.


when I change .xls to .xlsx I get this error:
Traceback (most recent call last):
 File "<string>", line 5, in <module>
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
   biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
   bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
 File "C:\Python26\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
   raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'


I've been searching online for a solution, but all I found was a 2011 post saying python didn't support Excel 2010.  I'm assuming that's no longer the case, but I'm guessing I need to update something?  I poked around Developer Central, but it's not clear to me what I need to do so I'm hoping someone will be able to guide me!

Thanks so much!
Cathie