How to find date of excel file change and save that as a value for a variable?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to find date of excel file change and save that as a value for a variable?

Eero Olli
Hi to you all,
 
How can I find the date a excel file has been changed and use this to
create value that I can save for each case?
 
I would love to fine a way to make the exportdate automatic and
dependent of excel-file change date, so that if there is a excel file
that has been exported on a different day, it can be detected in SPSS
data.
 
I am importing regularly 17 excel files that I first convert to 17 SPSS
files, which I then match to one large SPSS file.  Some of the files
provide cases, others provide variables.  Having the dates of export,
would allowe me to ensure that my data is consistent.  There can be a
timelag between export and import, and for me the exportdate is actually
the most important date.
 
These excel files are created manually, and things can go wrong.   Now I
make the assumption that all my files are exported on the same day,
which of course does not need to be true if something has gone wrong.
Thus, I could easily end up having one old excel file among several new
ones.
 
Now I use a pseudo code like for each of my 17 files.
 
GET FILE ...
 
* manuell update of exportdate.
COMPUTE exportdate= DATE.DMY(12,06,2008) .
VARIABLE LABELS exportdate 'Date for exporting information from
database'.

* automatic update of importdate.
COMPUTE importdate =XDATE.DATE($time).
VARIABLE LABELS importdate 'Date for importing information to SPSS'.
EXECUTE.
 
SAVE FILE...
 
 
I suspect that this problem is a common one and a great solution exists
allready. I am willing to use Python, as I cannot see how this can be
done within SPSS, only.  I am running my syntax files through the SPSS
Production Mode Facility, if that would make any difference.   It would
be best if the syntax runs on both v15 and v16.
 
Any suggestions?
 
Sincerely,
Eero Olli
________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway
 

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to find date of excel file change and save that as a value for a variable?

Albert-Jan Roskam
Hi,

Based on a file I downloaded somehwere I used the python code below. Is this what you're looking for?

Cheers!!
Albert-Jan

begin program.

"""
open all .sav files in a given dir and create a variable lastmod_date that indicates that file's last file modification date.
save as originalfilename_modified.sav
"""

import os, glob, time, spss

mydir = 'd:/temp/'
for folder in glob.glob(mydir):
    for file in glob.glob(folder + '/*.sav'):
        stats = os.stat(file)
        lastmod_date = time.localtime(stats[8])
        year = str(lastmod_date[0])
        month = str(lastmod_date[1]).zfill(2)
        day = str(lastmod_date[2]).zfill(2)
        timestamp = year + month + day
        spss.Submit("""get file = '%s'.
                               string lastmod_date (a8).
                               compute lastmod_date = '%s'.
                       variable label lastmod_date 'file modification date (year, month, day)').
                               save outfile = '%s_modified.sav'.
                        """ %(file, timestamp, file))

end program.

set mprint = on.


--- On Fri, 6/27/08, Eero Olli <[hidden email]> wrote:

> From: Eero Olli <[hidden email]>
> Subject: How to find date of excel file change and save that as a value              for a variable?
> To: [hidden email]
> Date: Friday, June 27, 2008, 3:26 PM
> Hi to you all,
>
> How can I find the date a excel file has been changed and
> use this to
> create value that I can save for each case?
>
> I would love to fine a way to make the exportdate automatic
> and
> dependent of excel-file change date, so that if there is a
> excel file
> that has been exported on a different day, it can be
> detected in SPSS
> data.
>
> I am importing regularly 17 excel files that I first
> convert to 17 SPSS
> files, which I then match to one large SPSS file.  Some of
> the files
> provide cases, others provide variables.  Having the dates
> of export,
> would allowe me to ensure that my data is consistent.
> There can be a
> timelag between export and import, and for me the
> exportdate is actually
> the most important date.
>
> These excel files are created manually, and things can go
> wrong.   Now I
> make the assumption that all my files are exported on the
> same day,
> which of course does not need to be true if something has
> gone wrong.
> Thus, I could easily end up having one old excel file among
> several new
> ones.
>
> Now I use a pseudo code like for each of my 17 files.
>
> GET FILE ...
>
> * manuell update of exportdate.
> COMPUTE exportdate= DATE.DMY(12,06,2008) .
> VARIABLE LABELS exportdate 'Date for exporting
> information from
> database'.
>
> * automatic update of importdate.
> COMPUTE importdate =XDATE.DATE($time).
> VARIABLE LABELS importdate 'Date for importing
> information to SPSS'.
> EXECUTE.
>
> SAVE FILE...
>
>
> I suspect that this problem is a common one and a great
> solution exists
> allready. I am willing to use Python, as I cannot see how
> this can be
> done within SPSS, only.  I am running my syntax files
> through the SPSS
> Production Mode Facility, if that would make any
> difference.   It would
> be best if the syntax runs on both v15 and v16.
>
> Any suggestions?
>
> Sincerely,
> Eero Olli
> ________________________________________
> Eero Olli
> Advisor
> the Equality and Anti-discrimination Ombud
> [hidden email]                   +47 2405 5951
> POB 8048 Dep,     N-0031 Oslo,      Norway
>
>
> ====================To manage your subscription to SPSSX-L,
> send a message to
> [hidden email] (not to SPSSX-L), with no body
> text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to find date of excel file change and save that as a value for a variable?

Peck, Jon
In reply to this post by Eero Olli
Here is another take on a solution.  It takes a file wildcard specification and creates an SPSS dataset of names, dates in seconds, and readable dates.

Suppose the code below is saved in a file named inventory.py that is on the Python search path, e.g., in c:/python25/lib/site-packages (for 16) or c:/python24/lib/site-packages (for 15).

Then you could use it as
begin program.
import spss, inventory
count = inventory.createfilechangedataset("c:/temp/*.xls", "Excelfiles")
print "File count:", count
end program.

This will work in 15 or 16 and requires the spssaux, spssdata, and namedtuple modules from SPSS Developer Central (www.spss.com/devcentral).

The first date variable is the file date in seconds as returned by the os.
The second one is an SPSS date variable holding day, month, year.

HTH,
Jon Peck

(If the mailer mangles the line structure too much, I can send this offline as an attachment.)


# create dataset of change times for all files of selected pattern

import spss,  os, glob, time
from spssdata import vdef, Spssdata, yrmodasec


def createfilechangedataset(filespec, dsname):
    """Create an SPSS dataset named dsname of all files matching filespec.  Return count

    Contents are variables filepath and change time (in seconds)."""

    filelist = glob.glob(filespec)
    if not filelist:
        return 0

    curs = Spssdata(accessType='n')
    curs.append(vdef('filespec', vfmt=("A", 100)))
    curs.append(vdef("changedata", vlabel="Date of most recent change in seconds"))
    curs.append(vdef("readabledate", vlabel="Date string of most recent change", vfmt=("DATE",12)))
    curs.commitdict()
    for f in filelist:
        changedate = os.stat(f).st_ctime
        readabledate = yrmodasec(time.localtime(changedate)[:3])
        curs.appendvalue(0, f)
        curs.appendvalue(1, changedate)
        curs.appendvalue(2, readabledate)
        curs.CommitCase()
    curs.CClose()
    return len(filelist)
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Eero Olli
Sent: Friday, June 27, 2008 7:27 AM
To: [hidden email]
Subject: [SPSSX-L] How to find date of excel file change and save that as a value for a variable?

Hi to you all,

How can I find the date a excel file has been changed and use this to
create value that I can save for each case?

I would love to fine a way to make the exportdate automatic and
dependent of excel-file change date, so that if there is a excel file
that has been exported on a different day, it can be detected in SPSS
data.

I am importing regularly 17 excel files that I first convert to 17 SPSS
files, which I then match to one large SPSS file.  Some of the files
provide cases, others provide variables.  Having the dates of export,
would allowe me to ensure that my data is consistent.  There can be a
timelag between export and import, and for me the exportdate is actually
the most important date.

These excel files are created manually, and things can go wrong.   Now I
make the assumption that all my files are exported on the same day,
which of course does not need to be true if something has gone wrong.
Thus, I could easily end up having one old excel file among several new
ones.

Now I use a pseudo code like for each of my 17 files.

GET FILE ...

* manuell update of exportdate.
COMPUTE exportdate= DATE.DMY(12,06,2008) .
VARIABLE LABELS exportdate 'Date for exporting information from
database'.

* automatic update of importdate.
COMPUTE importdate =XDATE.DATE($time).
VARIABLE LABELS importdate 'Date for importing information to SPSS'.
EXECUTE.

SAVE FILE...


I suspect that this problem is a common one and a great solution exists
allready. I am willing to use Python, as I cannot see how this can be
done within SPSS, only.  I am running my syntax files through the SPSS
Production Mode Facility, if that would make any difference.   It would
be best if the syntax runs on both v15 and v16.

Any suggestions?

Sincerely,
Eero Olli
________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway


=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

SV: [SPSSX-L] How to find date of excel file change and save that as a value for a variable?

Eero Olli
Thanks to all who have helped!

Jon's python script took me a big leap forward. Now I can simply include
this in my Production job and get a summary of the filedate everytime I
run the script.  And it is easy to save the dates to existing files as
new variables, when needed. Another alternative, which I realised first
today, would be to insert the filenames and dates into a DOCUMENT
statement to make sure that the datafile does allways contain
information about date for exporting the data.

*Show file dates.sps

begin program.
import spss, inventory
count = inventory.createfilechangedataset("c:/temp/*.xls", "Excelfiles")

print "File count:", count
end program.

DELETE VARIABLE changedata.
RENAME VARIABLE readabledate=exportdate.

SUMMARIZE
  /TABLES=filespec exportdate
  /FORMAT=VALIDLIST NOCASENUM TOTAL LIMIT=100
  /TITLE='Export dates for all files'
  /MISSING=VARIABLE
  /CELLS=COUNT .


I did find a "bug" in the inventory.py script.  The date used was
st_ctime, which shows creation date. I changed this to st_mtime, which
shows modified time, inorder to get the date I was looking for.  The
modifed python file is below.


# create dataset of change times for all files of selected pattern
# Writen by Jon Peck
# Modified by Eero Olli 30.06.2008 - changed from st_ctime to st_mtime

import spss,  os, glob, time
from spssdata import vdef, Spssdata, yrmodasec


def createfilechangedataset(filespec, dsname):
    """Create an SPSS dataset named dsname of all files matching
filespec.  Return count

    Contents are variables filepath and change time (in seconds)."""

    filelist = glob.glob(filespec)
    if not filelist:
        return 0

    curs = Spssdata(accessType='n')
    curs.append(vdef('filespec', vfmt=("A", 100)))
    curs.append(vdef("changedata", vlabel="Date of most recent change in
seconds"))
    curs.append(vdef("readabledate", vlabel="Date string of most recent
change", vfmt=("DATE",12)))
    curs.commitdict()
    for f in filelist:
        changedate = os.stat(f).st_mtime
        readabledate = yrmodasec(time.localtime(changedate)[:3])
        curs.appendvalue(0, f)
        curs.appendvalue(1, changedate)
        curs.appendvalue(2, readabledate)
        curs.CommitCase()
    curs.CClose()
    return len(filelist)




Many thanks to all who have helped!

Best,
Eero

________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SV: [SPSSX-L] How to find date of excel file change and save that as a value for a variable?

Peck, Jon
Sorry about the st_ctime.

The stat structure provides
st_atime for last access time
st_mtime for last modification, and
st_ctime, which on Windows is the creation time but is a little different on *nix platforms.

BTW, many people don't know that if you look at the Windows Explorer details view, you see the modification date, but under View/Choose Details you can opt to see the other timestamps (as well as lots of other interesting and sometimes bizarre properties).

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Eero Olli
Sent: Monday, June 30, 2008 3:37 AM
To: [hidden email]
Subject: [SPSSX-L] SV: [SPSSX-L] How to find date of excel file change and save that as a value for a variable?

Thanks to all who have helped!

Jon's python script took me a big leap forward. Now I can simply include
this in my Production job and get a summary of the filedate everytime I
run the script.  And it is easy to save the dates to existing files as
new variables, when needed. Another alternative, which I realised first
today, would be to insert the filenames and dates into a DOCUMENT
statement to make sure that the datafile does allways contain
information about date for exporting the data.

*Show file dates.sps

begin program.
import spss, inventory
count = inventory.createfilechangedataset("c:/temp/*.xls", "Excelfiles")

print "File count:", count
end program.

DELETE VARIABLE changedata.
RENAME VARIABLE readabledate=exportdate.

SUMMARIZE
  /TABLES=filespec exportdate
  /FORMAT=VALIDLIST NOCASENUM TOTAL LIMIT=100
  /TITLE='Export dates for all files'
  /MISSING=VARIABLE
  /CELLS=COUNT .


I did find a "bug" in the inventory.py script.  The date used was
st_ctime, which shows creation date. I changed this to st_mtime, which
shows modified time, inorder to get the date I was looking for.  The
modifed python file is below.


# create dataset of change times for all files of selected pattern
# Writen by Jon Peck
# Modified by Eero Olli 30.06.2008 - changed from st_ctime to st_mtime

import spss,  os, glob, time
from spssdata import vdef, Spssdata, yrmodasec


def createfilechangedataset(filespec, dsname):
    """Create an SPSS dataset named dsname of all files matching
filespec.  Return count

    Contents are variables filepath and change time (in seconds)."""

    filelist = glob.glob(filespec)
    if not filelist:
        return 0

    curs = Spssdata(accessType='n')
    curs.append(vdef('filespec', vfmt=("A", 100)))
    curs.append(vdef("changedata", vlabel="Date of most recent change in
seconds"))
    curs.append(vdef("readabledate", vlabel="Date string of most recent
change", vfmt=("DATE",12)))
    curs.commitdict()
    for f in filelist:
        changedate = os.stat(f).st_mtime
        readabledate = yrmodasec(time.localtime(changedate)[:3])
        curs.appendvalue(0, f)
        curs.appendvalue(1, changedate)
        curs.appendvalue(2, readabledate)
        curs.CommitCase()
    curs.CClose()
    return len(filelist)




Many thanks to all who have helped!

Best,
Eero

________________________________________
Eero Olli
Advisor
the Equality and Anti-discrimination Ombud
[hidden email]                   +47 2405 5951
POB 8048 Dep,     N-0031 Oslo,      Norway

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD