Excel via Python to SPSS: problem with DATESUM function

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Excel via Python to SPSS: problem with DATESUM function

Ruben Geert van den Berg
Dear all,

If I read an Excel file into SPSS with Python, something awkward is happening to variables with timestamp formats in Excel: after converting, 2 days are added to the date. The time (even seconds) is perfectly accurate, though.

I'm doing this via Python (again many thanks to Albert-Jan for this suggestion) because I'd like to generalize it to Excel workbooks with many sheets.

Could anyone suggest how the timestamp conversion is done properly?

And also, does anybody understand where the two extra days come from?

Thank you in advance,

Ruben

*Create test data.
begin program.
import xlwt,datetime
wb=xlwt.Workbook()
ws=wb.add_sheet("date_1")
fmt = xlwt.easyxf(num_format_str='M/D/YY')
for cnt,wDate in enumerate([datetime.datetime.now() + datetime.timedelta(days=d) for d in range(10)]):
    ws.write(cnt,0,wDate,fmt)
wb.save(r"d:\temp2\datetest.xls")
end program.

*Read in test data.
begin program.
import xlrd,spss
wb=xlrd.open_workbook(r"d:\temp2\datetest.xls")
ws=wb.sheets()[0]
allData = (ws.row_values(i) for i in range(ws.nrows))
with spss.DataStep():
    nds=spss.Dataset('*')
    nds.varlist.append(ws.name,0)
    for line in allData:
        nds.cases.append(line)
end program.

*Convert date to SPSS DATETIME format.
compute date_new=datesum(date.dmy(1,1,1900),date_1,"days").
format date_new(datetime22).
exe.
Reply | Threaded
Open this post in threaded view
|

Re: Excel via Python to SPSS: problem with DATESUM function

Albert-Jan Roskam
Hi Ruben,

I am not really familiar with the builtin date functions of xlrd. But this is wicked: I created the function below and it's also 2 days off!

import datetime
import doctest

def excel2human(excelDate):
    """http://support.microsoft.com/kb/214094
    >>> excel2human(32331.06)
    '1988-07-09 13:26:24'"""
    remainder = excelDate % 1
    theTime = remainder * 24 * 60 * 60
    theDate = int(excelDate)
    ref = datetime.datetime(1900, 1, 1, 12, 0, 0, 0)
    combined = ref + datetime.timedelta(days=theDate, seconds=theTime)
    return datetime.datetime.strftime(combined, "%Y-%m-%d %H:%M:%S")
if __name__ == "__main__":
    doctest.testmod()

Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


----- Original Message -----

> From: Ruben van den Berg <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Tuesday, December 4, 2012 11:13 AM
> Subject: [SPSSX-L] Excel via Python to SPSS: problem with DATESUM function
>
> Dear all,
>
> If I read an Excel file into SPSS with Python, something awkward is
> happening to variables with timestamp formats in Excel: after converting, 2
> days are added to the date. The time (even seconds) is perfectly accurate,
> though.
>
> I'm doing this via Python (again many thanks to Albert-Jan for  this
> suggestion
> <http://spssx-discussion.1045642.n5.nabble.com/Can-I-force-SPSS-to-release-an-xls-file-from-memory-after-reading-its-data-tt5716165.html#a5716183>
> ) because I'd like to generalize it to Excel workbooks with many sheets.
>
> Could anyone suggest how the timestamp conversion is done properly?
>
> And also, does anybody understand where the two extra days come from?
>
> Thank you in advance,
>
> Ruben
>
> *Create test data.
> begin program.
> import xlwt,datetime
> wb=xlwt.Workbook()
> ws=wb.add_sheet("date_1")
> fmt = xlwt.easyxf(num_format_str='M/D/YY')
> for cnt,wDate in enumerate([datetime.datetime.now() +
> datetime.timedelta(days=d) for d in range(10)]):
>     ws.write(cnt,0,wDate,fmt)
> wb.save(r"d:\temp2\datetest.xls")
> end program.
>
> *Read in test data.
> begin program.
> import xlrd,spss
> wb=xlrd.open_workbook(r"d:\temp2\datetest.xls")
> ws=wb.sheets()[0]
> allData = (ws.row_values(i) for i in range(ws.nrows))
> with spss.DataStep():
>     nds=spss.Dataset('*')
>     nds.varlist.append(ws.name,0)
>     for line in allData:
>         nds.cases.append(line)
> end program.
>
> *Convert date to SPSS DATETIME format.
> compute date_new=datesum(date.dmy(1,1,1900),date_1,"days").
> format date_new(datetime22).
> exe.
>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Excel-via-Python-to-SPSS-problem-with-DATESUM-function-tp5716635.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD