VAF and 14/15 datasets

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

VAF and 14/15 datasets

Richard Ristow
This one, maybe only somebody from SPSS, Inc., can answer.

Since a number of releases back, SPSS has used what's called a 'Virtual
Active File' ('VAF') in place of the old familiar scratch working file,
written to disk. The VAF retains access to whatever the data source was
originally, with the files remaining open, keeps a record of all
transformations ever applied, and applies them dynamically when the
file is read again. So you 'see' exactly what you would have if SPSS
did build the scratch file.

I don't know when the VAF was introduced. The marker is the CACHE
command: any version that has CACHE (not credit?) uses the VAF. CACHE
forces a scratch file to be created after all.

(VAFs always save disk space. Their other effects have been discussed
in a good many postings. Briefly, under some circumstances using a VAF
saves time; under others, it can cost a lot of processing time, and if
you have the disk space, CACHE can speed up a job dramatically.)

Now, the question: With SPSS 14 and 15 there isn't just one working
file; there can be multiple open datasets. Are all open dataset
implemented as VAFs, or does a dataset automatically get CACHEd when it
goes inactive?

(If there can be several VAFs open, what datasets does CACHE do? Most
likely it creates a scratch file for whatever dataset is active. It
could CACHE all datasets, though in that case it would be nice to have
it take dataset names as parameters, so you can control what's CACHEd
and what stays VAF.)

Thanks to all,
Richard
Reply | Threaded
Open this post in threaded view
|

Re: VAF and 14/15 datasets

Peck, Jon
Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed.  The CACHE(s) are independent.

CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything.  It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting.

HTH,
Jon Peck


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Saturday, February 03, 2007 10:00 PM
To: [hidden email]
Subject: [SPSSX-L] VAF and 14/15 datasets

This one, maybe only somebody from SPSS, Inc., can answer.

Since a number of releases back, SPSS has used what's called a 'Virtual
Active File' ('VAF') in place of the old familiar scratch working file,
written to disk. The VAF retains access to whatever the data source was
originally, with the files remaining open, keeps a record of all
transformations ever applied, and applies them dynamically when the
file is read again. So you 'see' exactly what you would have if SPSS
did build the scratch file.

I don't know when the VAF was introduced. The marker is the CACHE
command: any version that has CACHE (not credit?) uses the VAF. CACHE
forces a scratch file to be created after all.

(VAFs always save disk space. Their other effects have been discussed
in a good many postings. Briefly, under some circumstances using a VAF
saves time; under others, it can cost a lot of processing time, and if
you have the disk space, CACHE can speed up a job dramatically.)

Now, the question: With SPSS 14 and 15 there isn't just one working
file; there can be multiple open datasets. Are all open dataset
implemented as VAFs, or does a dataset automatically get CACHEd when it
goes inactive?

(If there can be several VAFs open, what datasets does CACHE do? Most
likely it creates a scratch file for whatever dataset is active. It
could CACHE all datasets, though in that case it would be nice to have
it take dataset names as parameters, so you can control what's CACHEd
and what stays VAF.)

Thanks to all,
Richard
Reply | Threaded
Open this post in threaded view
|

Re: VAF and 14/15 datasets

Art Kendall
Is CACHE relevant when using raw or system files a stand-alone machine?
Or is it only for when retrieving data from a server using SQL?

Art Kendall
Social Research Consultants

Peck, Jon wrote:

> Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed.  The CACHE(s) are independent.
>
> CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything.  It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
> Sent: Saturday, February 03, 2007 10:00 PM
> To: [hidden email]
> Subject: [SPSSX-L] VAF and 14/15 datasets
>
> This one, maybe only somebody from SPSS, Inc., can answer.
>
> Since a number of releases back, SPSS has used what's called a 'Virtual
> Active File' ('VAF') in place of the old familiar scratch working file,
> written to disk. The VAF retains access to whatever the data source was
> originally, with the files remaining open, keeps a record of all
> transformations ever applied, and applies them dynamically when the
> file is read again. So you 'see' exactly what you would have if SPSS
> did build the scratch file.
>
> I don't know when the VAF was introduced. The marker is the CACHE
> command: any version that has CACHE (not credit?) uses the VAF. CACHE
> forces a scratch file to be created after all.
>
> (VAFs always save disk space. Their other effects have been discussed
> in a good many postings. Briefly, under some circumstances using a VAF
> saves time; under others, it can cost a lot of processing time, and if
> you have the disk space, CACHE can speed up a job dramatically.)
>
> Now, the question: With SPSS 14 and 15 there isn't just one working
> file; there can be multiple open datasets. Are all open dataset
> implemented as VAFs, or does a dataset automatically get CACHEd when it
> goes inactive?
>
> (If there can be several VAFs open, what datasets does CACHE do? Most
> likely it creates a scratch file for whatever dataset is active. It
> could CACHE all datasets, though in that case it would be nice to have
> it take dataset names as parameters, so you can control what's CACHEd
> and what stays VAF.)
>
> Thanks to all,
> Richard
>
>
>
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: VAF and 14/15 datasets

Peck, Jon
Following is an extract from the help (see virtual active file in the index).  As you can see, transformations generally create temporary fragments.  The CACHE consolidates these.

By default, changes are consolidated once there are 20 changes.  You can control this value via
SET CACHE n.

The effect of caching is most noticeable when it is bypassing a repeated pull from an external data source, but the cache is relevant under other circumstances.

HTH,
Jon Peck

Virtual Active File
The virtual active file enables you to work with large data files without requiring equally large (or larger) amounts of temporary disk space. For most analysis and charting procedures, the original data source is reread each time you run a different procedure. Procedures that modify the data require a certain amount of temporary disk space to keep track of the changes, and some actions always require enough disk space for at least one entire copy of the data file.

Actions that don't require any temporary disk space include:
• Reading SPSS data files

• Merging two or more SPSS data files

• Reading database tables with the Database Wizard

• Merging an SPSS data file with a database table

• Running procedures that read data (for example, Frequencies, Crosstabs, Explore)

Actions that create one or more columns of data in temporary disk space include:
• Computing new variables

• Recoding existing variables

• Running procedures that create or modify variables (for example, saving predicted values in Linear Regression)

-----Original Message-----
From: Art Kendall [mailto:[hidden email]]
Sent: Sunday, February 04, 2007 10:34 AM
To: Peck, Jon
Cc: [hidden email]
Subject: Re: VAF and 14/15 datasets

Is CACHE relevant when using raw or system files a stand-alone machine?
Or is it only for when retrieving data from a server using SQL?

Art Kendall
Social Research Consultants

Peck, Jon wrote:

> Switching the active dataset does not have any effect on the VAF except to freeze its state and unfreeze or create the new one if needed.  The CACHE(s) are independent.
>
> CACHE works only on the active dataset; activating a different dataset does not in itself generally cache anything.  It will cause pending transformations to be realized, which could cause caching if a CACHE command is pending or if the length of the chain of objects reaches the CACHE setting.
>
> HTH,
> Jon Peck
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
> Sent: Saturday, February 03, 2007 10:00 PM
> To: [hidden email]
> Subject: [SPSSX-L] VAF and 14/15 datasets
>
> This one, maybe only somebody from SPSS, Inc., can answer.
>
> Since a number of releases back, SPSS has used what's called a 'Virtual
> Active File' ('VAF') in place of the old familiar scratch working file,
> written to disk. The VAF retains access to whatever the data source was
> originally, with the files remaining open, keeps a record of all
> transformations ever applied, and applies them dynamically when the
> file is read again. So you 'see' exactly what you would have if SPSS
> did build the scratch file.
>
> I don't know when the VAF was introduced. The marker is the CACHE
> command: any version that has CACHE (not credit?) uses the VAF. CACHE
> forces a scratch file to be created after all.
>
> (VAFs always save disk space. Their other effects have been discussed
> in a good many postings. Briefly, under some circumstances using a VAF
> saves time; under others, it can cost a lot of processing time, and if
> you have the disk space, CACHE can speed up a job dramatically.)
>
> Now, the question: With SPSS 14 and 15 there isn't just one working
> file; there can be multiple open datasets. Are all open dataset
> implemented as VAFs, or does a dataset automatically get CACHEd when it
> goes inactive?
>
> (If there can be several VAFs open, what datasets does CACHE do? Most
> likely it creates a scratch file for whatever dataset is active. It
> could CACHE all datasets, though in that case it would be nice to have
> it take dataset names as parameters, so you can control what's CACHEd
> and what stays VAF.)
>
> Thanks to all,
> Richard
>
>
>