Dear all, I have a problem: I'm reading in an Excel workbook with 365 sheets of data. Every sheet becomes a new DataSet and every time 50 DataSets are open simultaneously, an ADD FILES command merges them into a single DataSet and then new sheets are read. Unfortunately, I get an error and the weird thing is that every time I run the exact same code, I get it after a different number of sheets (20, 57, 84, 82, 67 and so on...) have been processed. >Error. Command name: GET DATA >(2052) Error accessing the Excel file. The file may be open by another >application. If so, close it and retry. >* File: "d:\temp\testbook.xls" >Execution of this command stops. My first guess would be that SPSS is somehow 'too slow' in releasing the workbook from memory. I tried adding cache. exe. after reading each sheet but that doesn't seem to make any difference. Does anyone have an idea whether/how this could be solved? TIA! |
I am in an two half-day meetings today with no access to email - I will respond to email Wednesday 14 November. Please contact:
·
[hidden email] for questions about or assistance with MIP ProjectPages
(page functionality, content on the pages, i.e. PI name, proejct title, etc.)
·
[hidden email] for questions about charge
numbers
·
[hidden email]
for quesitons or assiatcne with MIP processes or the webiste
·
[hidden email] for assistance with
other Innovation Zone sites, such as CI&T InZone Regards, Mary Lou |
Administrator
|
In reply to this post by Ruben Geert van den Berg
At some point I would likely say screw it and use VBA to build a concatenated text file of the 365 sheets and then run SPSS against the text file. I know this isn't the answer you are looking for but ... After about wasting 2 hrs f'ing around I would take 10 minutes and write the VBA code and it would probably finish running in about 20 seconds ;-) --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks David!
Another list member had the same suggestion but I'm completely (really 100%) unfamiliar with VBA. However, I think I may fix it with the Python xlrd and xlwt modules, at least I'm familiar with that syntax. Best regards, Ruben > Date: Tue, 13 Nov 2012 04:16:26 -0800 > From: [hidden email] > Subject: Re: Can I force SPSS to release an .xls file from memory after reading its data? > To: [hidden email] > > At some point I would likely say screw it and use VBA to build a > concatenated text file of the 365 sheets and then run SPSS against the text > file. I know this isn't the answer you are looking for but ... > After about wasting 2 hrs f'ing around I would take 10 minutes and write the > VBA code and it would probably finish running in about 20 seconds ;-) > -- > Ruben van den Berg wrote > > Dear all, > > I have a problem: I'm reading in an Excel workbook with 365 sheets of > > data. Every sheet becomes a new DataSet and every time 50 DataSets are > > open simultaneously, an ADD FILES command merges them into a single > > DataSet and then new sheets are read. > > Unfortunately, I get an error and the weird thing is that every time I run > > the exact same code, I get it after a different number of sheets (20, 57, > > 84, 82, 67 and so on...) have been processed. > >>Error. Command name: GET DATA>(2052) Error accessing the Excel file. The > file may be open by another>application. If so, close it and retry.>* File: > "d:\temp\testbook.xls">Execution of this command stops. > > My first guess would be that SPSS is somehow 'too slow' in releasing the > > workbook from memory. I tried adding > > cache.exe. > > after reading each sheet but that doesn't seem to make any difference. > > Does anyone have an idea whether/how this could be solved? > > TIA! > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to email me. > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Can-I-force-SPSS-to-release-an-xls-file-from-memory-after-reading-its-data-tp5716165p5716167.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
In reply to this post by Ruben Geert van den Berg
Ruben, Do the sheets follow a sequential naming convention, as in Sheet_1, Sheet_2, etc? Python or a macro should be able to read in each sheet without running into a memory issue. From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg Dear all, I have a problem: I'm reading in an Excel workbook with 365 sheets of data. Every sheet becomes a new DataSet and every time 50 DataSets are open simultaneously, an ADD FILES command merges them into a single DataSet and then new sheets are read. Unfortunately, I get an error and the weird thing is that every time I run the exact same code, I get it after a different number of sheets (20, 57, 84, 82, 67 and so on...) have been processed. >Error. Command name: GET DATA >(2052) Error accessing the Excel file. The file may be open by another >application. If so, close it and retry. >* File: "d:\temp\testbook.xls" >Execution of this command stops. My first guess would be that SPSS is somehow 'too slow' in releasing the workbook from memory. I tried adding cache. exe. after reading each sheet but that doesn't seem to make any difference. Does anyone have an idea whether/how this could be solved? TIA! |
Dear Bob,
Thanks for your reply. It actually is a Python block that should do the trick because I want the 365 sheet names in my data without needing them in my syntax. And the syntax that's mprinted back looks gorgeous! I agree that there should be no problem. But as a scientist, I must take seriously the observation that it keeps on crashing, at least on my (crappy) system. Kind regards, Ruben Date: Tue, 13 Nov 2012 14:29:02 -0500 From: [hidden email] Subject: Re: Can I force SPSS to release an .xls file from memory after reading its data? To: [hidden email] Ruben,
Do the sheets follow a sequential naming convention, as in Sheet_1, Sheet_2, etc? Python or a macro should be able to read in each sheet without running into a memory issue.
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Dear all,
I have a problem: I'm reading in an Excel workbook with 365 sheets of data. Every sheet becomes a new DataSet and every time 50 DataSets are open simultaneously, an ADD FILES command merges them into a single DataSet and then new sheets are read.
Unfortunately, I get an error and the weird thing is that every time I run the exact same code, I get it after a different number of sheets (20, 57, 84, 82, 67 and so on...) have been processed.
>Error. Command name: GET DATA >(2052) Error accessing the Excel file. The file may be open by another >application. If so, close it and retry. >* File: "d:\temp\testbook.xls" >Execution of this command stops.
My first guess would be that SPSS is somehow 'too slow' in releasing the workbook from memory. I tried adding
cache. exe.
after reading each sheet but that doesn't seem to make any difference. Does anyone have an idea whether/how this could be solved?
TIA!
|
Hi Ruben,
I presume you've already checked out xlrd?
import xlrd
wb = xlrd.open_workbook('fatfile.xls') for sheet in wb.sheet_names(): data_of_one_sheet = [sheet.row_values(i) for i in xrange(sheet.nrows)] #... etc.
Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Hi Albert-Jan!
You presume correctly. I think my system resources rather than my syntax are the problem. Yesterday, I managed to process up to 338 sheets but then it crashed again (and it usually crashes much sooner). See below for the code. Kind regards, Ruben *Create test data. begin program. rdir="d:/temp/" import xlwt from random import choice import datetime from datetime import date dates=[int((date(2011,1,1)+datetime.timedelta(days=x)).strftime("20%y%m%d"))for x in range(365)] wb=xlwt.Workbook() for date in dates: ws=wb.add_sheet("data_"+str(date)) for col,cont in enumerate(['PersonID','type','date','events']): ws.write(0,col,cont) for row,resp in enumerate([104,21,60,2,1030]): ws.write(row+1,0,resp) ws.write(1,1,'production') # If not, this column may not have 'production' in some sheets -> string lengths! for row in range(4): ws.write(row+2,1,choice(['beta','production','test'])) for row in range(5): ws.write(row+1,2,date) for row in range(5): ws.write(row+1,3,choice(range(20))) wb.save(os.path.join(rdir,'testbook.xls')) end program. *Should work in theory but system resources seem to run out. begin program. workbook=("d:/temp/testbook.xls") import xlrd,spss wb=xlrd.open_workbook(workbook) vallabs=[str(cnt+1)+"'"+nam+"'" for cnt,nam in enumerate([s.name for s in wb.sheets()])] datas=[] for cnt,s in enumerate(wb.sheets()): datas.append("data_"+str(cnt+1)) spss.Submit(""" GET DATA /TYPE=XLS /FILE='%s' /SHEET=name '%s' /CELLRANGE=full /READNAMES=on /ASSUMEDSTRWIDTH=32767. cache. exe. dataset name data_%d. comp sheet=%d.\nexe. """%(workbook,s.name,cnt+1,cnt+1)) #end program. if cnt != 0 and (cnt%49==0 or cnt+1==len(wb.sheets())): spss.Submit("add files " + "/".join(["file="+fil for fil in datas]) + ".\nexe.\ndatas clo all.\ndatas nam data_1.") datas=["data_1"] spss.Submit("val lab sheet %s."%" ".join(vallabs)) end program. Date: Wed, 14 Nov 2012 00:59:24 -0800 From: [hidden email] Subject: Re: Can I force SPSS to release an .xls file from memory after reading its data? To: [hidden email] Hi Ruben,
I presume you've already checked out xlrd?
import xlrd
wb = xlrd.open_workbook('fatfile.xls') for sheet in wb.sheet_names(): data_of_one_sheet = [sheet.row_values(i) for i in xrange(sheet.nrows)] #... etc.
Regards,
Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
Administrator
|
Why not start with the 365 sheet wkbook, add a 366th and concatenate the 365 into the 366th.
Then read the 366th into SPSS? That would eliminate having a huge number of SPSS data sets open. Difficult to know what the source is. Probably having 365 data sets in SPSS??? >;-( --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Ruben Geert van den Berg
<snip>
>*Should work in theory but system resources seem to run out. > > >begin program. >workbook=("d:/temp/testbook.xls") >import xlrd,spss >wb=xlrd.open_workbook(workbook) >vallabs=[str(cnt+1)+"'"+nam+"'" for cnt,nam in enumerate([s.name for s in wb.sheets()])] >datas=[] >for cnt,s in enumerate(wb.sheets()): > datas.append("data_"+str(cnt+1)) xlrd rocks. In fact it rocks so much that I wouldn't bother to use GET DATA. I suspect that you get an SPSS_FITAB_FULL error. In other words: too many datasets open at the same time. Maybe something like this? (entirely untested) allData = (sheet.row_values(i) for i in xrange(sheet.nrows) for sheet in wb.sheet_names()) # generator expression --> small memory footprint varNames = ['x', 'y', 'z'] varTypes = {'x': 0, 'y': 50, 'z': 20} with spss.DataStep(): datasetObj = spss.Dataset(name=None) for varName in varNames: if varTypes[varName] == 0: datasetObj.varlist.append(varName) else: datasetObj.varlist.append(varName, varTypes[varName]) for line in allData: datasetObj.cases.append(list(line)) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Dear Albert-Jan,
I wasn't familiar with "datasetObj" but after some tinkering, the code below runs like an antilope. I'd almost say it's even better than my own attempt ;-) Thanks a lot!! Ruben begin program. allData = ([sheet.name]+sheet.row_values(i) for sheet in wb.sheets() for i in xrange(1,sheet.nrows)) varNames = ['sheet','PersonID','type','date','events'] varTypes = {'sheet':20,'PersonID': 0, 'type': 12, 'date': 0,'events':0} with spss.DataStep(): datasetObj = spss.Dataset(name=None) for varName in varNames: if varTypes[varName] == 0: datasetObj.varlist.append(varName) else: datasetObj.varlist.append(varName, varTypes[varName]) for line in allData: datasetObj.cases.append(list(line)) end program. > Date: Wed, 14 Nov 2012 04:33:16 -0800 > From: [hidden email] > Subject: Re: Can I force SPSS to release an .xls file from memory after reading its data? > To: [hidden email] > > <snip> > >*Should work in theory but system resources seem to run out. > > > > > >begin program. > >workbook=("d:/temp/testbook.xls") > >import xlrd,spss > >wb=xlrd.open_workbook(workbook) > >vallabs=[str(cnt+1)+"'"+nam+"'" for cnt,nam in enumerate([s.name for s in wb.sheets()])] > >datas=[] > >for cnt,s in enumerate(wb.sheets()): > > datas.append("data_"+str(cnt+1)) > > xlrd rocks. In fact it rocks so much that I wouldn't bother to use GET DATA. I suspect that you get an SPSS_FITAB_FULL error. In other words: too many datasets open at the same time. Maybe something like this? (entirely untested) > > allData = (sheet.row_values(i) for i in xrange(sheet.nrows) for sheet in wb.sheet_names()) # generator expression --> small memory footprint > varNames = ['x', 'y', 'z'] > varTypes = {'x': 0, 'y': 50, 'z': 20} > with spss.DataStep(): > datasetObj = spss.Dataset(name=None) > for varName in varNames: > if varTypes[varName] == 0: > > datasetObj.varlist.append(varName) > else: > datasetObj.varlist.append(varName, varTypes[varName]) > > for line in allData: > datasetObj.cases.append(list(line)) > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Free forum by Nabble | Edit this page |