This one, maybe only somebody from SPSS, Inc., can answer.
Since a number of releases back, SPSS has used what's called a 'Virtual Active File' ('VAF') in place of the old familiar scratch working file, written to disk. The VAF retains access to whatever the data source was originally, with the files remaining open, keeps a record of all transformations ever applied, and applies them dynamically when the file is read again. So you 'see' exactly what you would have if SPSS did build the scratch file. I don't know when the VAF was introduced. The marker is the CACHE command: any version that has CACHE (not credit?) uses the VAF. CACHE forces a scratch file to be created after all. (VAFs always save disk space. Their other effects have been discussed in a good many postings. Briefly, under some circumstances using a VAF saves time; under others, it can cost a lot of processing time, and if you have the disk space, CACHE can speed up a job dramatically.) Now, the question: With SPSS 14 and 15 there isn't just one working file; there can be multiple open datasets. Are all open dataset implemented as VAFs, or does a dataset automatically get CACHEd when it goes inactive? (If there can be several VAFs open, what datasets does CACHE do? Most likely it creates a scratch file for whatever dataset is active. It could CACHE all datasets, though in that case it would be nice to have it take dataset names as parameters, so you can control what's CACHEd and what stays VAF.) Thanks to all, Richard |
Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed. The CACHE(s) are independent.
CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything. It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Saturday, February 03, 2007 10:00 PM To: [hidden email] Subject: [SPSSX-L] VAF and 14/15 datasets This one, maybe only somebody from SPSS, Inc., can answer. Since a number of releases back, SPSS has used what's called a 'Virtual Active File' ('VAF') in place of the old familiar scratch working file, written to disk. The VAF retains access to whatever the data source was originally, with the files remaining open, keeps a record of all transformations ever applied, and applies them dynamically when the file is read again. So you 'see' exactly what you would have if SPSS did build the scratch file. I don't know when the VAF was introduced. The marker is the CACHE command: any version that has CACHE (not credit?) uses the VAF. CACHE forces a scratch file to be created after all. (VAFs always save disk space. Their other effects have been discussed in a good many postings. Briefly, under some circumstances using a VAF saves time; under others, it can cost a lot of processing time, and if you have the disk space, CACHE can speed up a job dramatically.) Now, the question: With SPSS 14 and 15 there isn't just one working file; there can be multiple open datasets. Are all open dataset implemented as VAFs, or does a dataset automatically get CACHEd when it goes inactive? (If there can be several VAFs open, what datasets does CACHE do? Most likely it creates a scratch file for whatever dataset is active. It could CACHE all datasets, though in that case it would be nice to have it take dataset names as parameters, so you can control what's CACHEd and what stays VAF.) Thanks to all, Richard |
Is CACHE relevant when using raw or system files a stand-alone machine?
Or is it only for when retrieving data from a server using SQL? Art Kendall Social Research Consultants Peck, Jon wrote: > Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed. The CACHE(s) are independent. > > CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything. It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting. > > HTH, > Jon Peck > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow > Sent: Saturday, February 03, 2007 10:00 PM > To: [hidden email] > Subject: [SPSSX-L] VAF and 14/15 datasets > > This one, maybe only somebody from SPSS, Inc., can answer. > > Since a number of releases back, SPSS has used what's called a 'Virtual > Active File' ('VAF') in place of the old familiar scratch working file, > written to disk. The VAF retains access to whatever the data source was > originally, with the files remaining open, keeps a record of all > transformations ever applied, and applies them dynamically when the > file is read again. So you 'see' exactly what you would have if SPSS > did build the scratch file. > > I don't know when the VAF was introduced. The marker is the CACHE > command: any version that has CACHE (not credit?) uses the VAF. CACHE > forces a scratch file to be created after all. > > (VAFs always save disk space. Their other effects have been discussed > in a good many postings. Briefly, under some circumstances using a VAF > saves time; under others, it can cost a lot of processing time, and if > you have the disk space, CACHE can speed up a job dramatically.) > > Now, the question: With SPSS 14 and 15 there isn't just one working > file; there can be multiple open datasets. Are all open dataset > implemented as VAFs, or does a dataset automatically get CACHEd when it > goes inactive? > > (If there can be several VAFs open, what datasets does CACHE do? Most > likely it creates a scratch file for whatever dataset is active. It > could CACHE all datasets, though in that case it would be nice to have > it take dataset names as parameters, so you can control what's CACHEd > and what stays VAF.) > > Thanks to all, > Richard > > >
Art Kendall
Social Research Consultants |
Following is an extract from the help (see virtual active file in the index). As you can see, transformations generally create temporary fragments. The CACHE consolidates these.
By default, changes are consolidated once there are 20 changes. You can control this value via SET CACHE n. The effect of caching is most noticeable when it is bypassing a repeated pull from an external data source, but the cache is relevant under other circumstances. HTH, Jon Peck Virtual Active File The virtual active file enables you to work with large data files without requiring equally large (or larger) amounts of temporary disk space. For most analysis and charting procedures, the original data source is reread each time you run a different procedure. Procedures that modify the data require a certain amount of temporary disk space to keep track of the changes, and some actions always require enough disk space for at least one entire copy of the data file. Actions that don't require any temporary disk space include: • Reading SPSS data files • Merging two or more SPSS data files • Reading database tables with the Database Wizard • Merging an SPSS data file with a database table • Running procedures that read data (for example, Frequencies, Crosstabs, Explore) Actions that create one or more columns of data in temporary disk space include: • Computing new variables • Recoding existing variables • Running procedures that create or modify variables (for example, saving predicted values in Linear Regression) -----Original Message----- From: Art Kendall [mailto:[hidden email]] Sent: Sunday, February 04, 2007 10:34 AM To: Peck, Jon Cc: [hidden email] Subject: Re: VAF and 14/15 datasets Is CACHE relevant when using raw or system files a stand-alone machine? Or is it only for when retrieving data from a server using SQL? Art Kendall Social Research Consultants Peck, Jon wrote: > Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed. The CACHE(s) are independent. > > CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything. It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting. > > HTH, > Jon Peck > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow > Sent: Saturday, February 03, 2007 10:00 PM > To: [hidden email] > Subject: [SPSSX-L] VAF and 14/15 datasets > > This one, maybe only somebody from SPSS, Inc., can answer. > > Since a number of releases back, SPSS has used what's called a 'Virtual > Active File' ('VAF') in place of the old familiar scratch working file, > written to disk. The VAF retains access to whatever the data source was > originally, with the files remaining open, keeps a record of all > transformations ever applied, and applies them dynamically when the > file is read again. So you 'see' exactly what you would have if SPSS > did build the scratch file. > > I don't know when the VAF was introduced. The marker is the CACHE > command: any version that has CACHE (not credit?) uses the VAF. CACHE > forces a scratch file to be created after all. > > (VAFs always save disk space. Their other effects have been discussed > in a good many postings. Briefly, under some circumstances using a VAF > saves time; under others, it can cost a lot of processing time, and if > you have the disk space, CACHE can speed up a job dramatically.) > > Now, the question: With SPSS 14 and 15 there isn't just one working > file; there can be multiple open datasets. Are all open dataset > implemented as VAFs, or does a dataset automatically get CACHEd when it > goes inactive? > > (If there can be several VAFs open, what datasets does CACHE do? Most > likely it creates a scratch file for whatever dataset is active. It > could CACHE all datasets, though in that case it would be nice to have > it take dataset names as parameters, so you can control what's CACHEd > and what stays VAF.) > > Thanks to all, > Richard > > > |
Free forum by Nabble | Edit this page |