SPSSX Discussion

Replicating cases

Classic

List

Threaded

9 messages Options

Allan Reese (Cefas)

Replicating cases

Under v16, is there a better way to replicate cases than by writing out the data with a weight variable and making an INPUT PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST. It must be a faq but I can't spot it in the archive. Now you can have many files open, there ought to be a route to use one open file as the source for an INPUT PROGRAM.

Thanks
Allan

***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Replicating cases

If you do not want any cases with fractional weights, try something like
this untested syntax.
use one of these computes
compute write_weight = trunc(weightvar+1).
compute write_weight = rnd(weightvar).
then
loop replic_num = 1 to write_weight.
xsave 'd:/project/expanded.sav'/keep = replic_num write_weight all.
end loop.
execute.

Art Kendall
Social Research Consultants

Allan Reese (Cefas) wrote:

> Under v16, is there a better way to replicate cases than by writing out the data with a weight variable and making an INPUT PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST. It must be a faq but I can't spot it in the archive. Now you can have many files open, there ought to be a route to use one open file as the source for an INPUT PROGRAM.
>
> Thanks
> Allan
>
>
> ***********************************************************************************
> This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
> ***********************************************************************************
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

Peck, Jon

Re: Replicating cases

In reply to this post by Allan Reese (Cefas)

Input Programs don't take open datasets as input because that would mean having two different variable dictionaries active at the same time.

But programmability makes this easy. Suppose you have a weight variable named weight and want to write a text file. This example writes an Excel-style csv file to c:/temp/mydata.csv assuming that the weight variable is named wtvar. It opens the data file, loops through the input cases, and writes output cases replicating by the value of the weight variable.

This version uses the SPSS 16 Dataset class. Slightly different code would be used with previous SPSS versions.

BEGIN PROGRAM.
import spss, spssaux, csv
spssaux.OpenDataFile("c:/temp/mydata.sav")
spss.StartDataStep()
f = file("c:/temp/mydata.csv", "wb")
writer = csv.writer(f)
ds = spss.Dataset()
ncases =spss.GetCaseCount()
wtvar = ds.varlist['wtvar'].index

for c in range(ncases):
for w in range(ds.cases[c, wtvar][0]):
writer.writerow(ds.cases[c])

spss.EndDataStep()
f.close()
END PROGRAM.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Allan Reese (Cefas)
Sent: Tuesday, June 03, 2008 8:39 AM
To: [hidden email]
Subject: [SPSSX-L] Replicating cases

Under v16, is there a better way to replicate cases than by writing out the data with a weight variable and making an INPUT PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST. It must be a faq but I can't spot it in the archive. Now you can have many files open, there ought to be a route to use one open file as the source for an INPUT PROGRAM.

Thanks
Allan

***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: Replicating cases

In reply to this post by Allan Reese (Cefas)

Art and I interpreted your request differently: Art's approach creates a sav file from an SPSS dataset. I assumed, since you mentioned WRITE etc, that you wanted a text file as the output.
for just replicating the cases XSAVE is a perfect solution. You could do the equivalent with the programmability Dataset class, but there is no need for that if XSAVE does the job.

-----Original Message-----
From: Peck, Jon
Sent: Tuesday, June 03, 2008 9:12 AM
To: 'Allan Reese (Cefas)'; [hidden email]
Subject: RE: [SPSSX-L] Replicating cases

Input Programs don't take open datasets as input because that would mean having two different variable dictionaries active at the same time.

But programmability makes this easy. Suppose you have a weight variable named weight and want to write a text file. This example writes an Excel-style csv file to c:/temp/mydata.csv assuming that the weight variable is named wtvar. It opens the data file, loops through the input cases, and writes output cases replicating by the value of the weight variable.

This version uses the SPSS 16 Dataset class. Slightly different code would be used with previous SPSS versions.

BEGIN PROGRAM.
import spss, spssaux, csv
spssaux.OpenDataFile("c:/temp/mydata.sav")
spss.StartDataStep()
f = file("c:/temp/mydata.csv", "wb")
writer = csv.writer(f)
ds = spss.Dataset()
ncases =spss.GetCaseCount()
wtvar = ds.varlist['wtvar'].index

for c in range(ncases):
for w in range(ds.cases[c, wtvar][0]):
writer.writerow(ds.cases[c])

spss.EndDataStep()
f.close()
END PROGRAM.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Allan Reese (Cefas)
Sent: Tuesday, June 03, 2008 8:39 AM
To: [hidden email]
Subject: [SPSSX-L] Replicating cases

Under v16, is there a better way to replicate cases than by writing out the data with a weight variable and making an INPUT PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST. It must be a faq but I can't spot it in the archive. Now you can have many files open, there ought to be a route to use one open file as the source for an INPUT PROGRAM.

Thanks
Allan

***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Replicating cases

In reply to this post by Allan Reese (Cefas)

At 10:38 AM 6/3/2008, Allan Reese (Cefas) wrote:

>Under v16, is there a better way to replicate cases than by writing
>out the data with a weight variable and making an INPUT
>PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST.

I don't know whether this came clear from the other responses, but
LOOP *does* apply to WRITE and PRINT (which are transformation
commands). (It doesn't apply to LIST.)

Anyway, if you're comfortable with WRITE and PRINT, using them inside
LOOP in a transformation program handles your problem. In this case,
you probably need to close the transformation program with an EXECUTE.

If you want to write an SPSS dataset, use XSAVE instead of PRINT or
WRITE; XSAVE, too, is a transformation command. Alas, XSAVE can only
write to a disk file, not to a dataset.

>Now you can have many files open, there ought to be a route to use
>one open file as the source for an INPUT PROGRAM.

Jon Peck wrote, about that,
>>Input Programs don't take open datasets as input because that would
>>mean having two different variable dictionaries active at the same time.

Ah, that's the reason. I've long been sorry about the restriction,
myself. The END CASE command in an INPUT PROGRAM is the closest
equivalent to SAS's OUTPUT statement, and I've got a lot of use out
of OUTPUT.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Allan Reese (Cefas)

Replicating cases into new file - solution ( duplicating cases / disaggregate )

In reply to this post by Art Kendall

Original question:
Allan Reese (Cefas) wrote:
> Under v16, is there a better way to replicate cases than by writing out the data with a weight variable and making an INPUT PROGRAM? WEIGHT doesn't get applied to WRITE, PRINT or LIST. It must be a faq but I can't spot it in the archive. Now you can have many files open, there ought to be a route to use one open file as the source for an INPUT PROGRAM.
>
> Thanks
> Allan

-------
Art Kendall [mailto:[hidden email]] responded Sent: 03 June 2008 16:09

If you do not want any cases with fractional weights, try something like
this untested syntax.
use one of these computes
compute write_weight = trunc(weightvar+1).
compute write_weight = rnd(weightvar).
then
loop replic_num = 1 to write_weight.
xsave 'd:/project/expanded.sav'/keep = replic_num write_weight all.
end loop.
execute.

Art Kendall
Social Research Consultants

------
After testing, v16 under Windows worked with the following syntax. My test file includes a variable WT with positive integer values. Non-integer weights would require some arbitrary handling.

loop replic_num = 1 to wt.
xsave outfile="c:\expanded.sav" /keep = all.
end loop.
execute.

Thanks for a very neat solution which I'm archiving with some extra keywords to help people in future to find it. Nice that each case then includes replic_num of wt.

Art gratia mihi

Allan

***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

b.todosijevic

A Crosstab question

In reply to this post by Richard Ristow

A Crosstab question

I have a cumulative data file - same variables collected in different
years. I need to show the distribution of the variables by year.

The tables should show percentages for each year (column percentages),
*but* I would like the tables to have these features:

- percentages for all years should include also the missing values
(e.g., DK, NA);
- one particular missing value code should *not* be included in
percentage calculation nor represented by percentages in the table, but
by count ('dropped cases');
- number of valid cases per year to be included in the row above the
bottom row
- total number of cases per year in the bottom row
- totals for the entire sample in the last column (again excluding
'dropped' from percentage calculation)

Illustration:
Year
Gender 1987 1989 1990 Total
1 Male 45% 48% 0 46.5%
2 Female 45% 50% 0 ...

Missing:
96 DK 8% 0 0 8%
97 NA 2% 0 0 2%

99 Dropped 0 20 1200 (na)

N valid 1350 980 0 2330
N total 1500 1000 1200 3700

I wonder if it is possible to arrange such a table with SPSS syntax, so
that no further editing is necessary?
It would also be OK if I could just leave out the 'dropped' cases from
percentage calculation, and include them as count in the bottom row.
Thanks for the help.
Bojan Todosijevic

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Matthew Reeder

Question: OMS with Crosstabs

Hey list,

I'm converting output data (crosstabs) into SPSS datafiles for additional analysis using OMS. Each is a 2x2 crosstab (2 binary variables), where one of the variables (x1) is the same from one crosstabulation to the next, while the second variable in each crosstab is always different (y1, y2, y3, . . ., yk). So, the first crosstab would be x1 against y1, the second x1 against y2, and so on. x1 takes on the levels of 'low' and 'high,' while each of the y variables takes on the values of 0 or 1.

The resultant datafile that I'm creating would ideally be structured such that each of the k y-variables from the original dataset will correspond to a row (k rows altogether). Each row would contain 4 variables corresponding to the crosstabulation results based on that variable (since x and the y-variables are dichotomous, the variables in the resultant dataset would represent counts for each of Low-0, Low-1, High-0, High-1, respectively). So, the row corresponding to say, y1, would contain variables corresponding to the values of y1 being crosstabbed against x1, and so on and so forth up through yk. Simple enough.

Here's the problem: Ideally, I would like to have the dataset contain only 4 variables for all of the crosstabulation results. However, given how OMS creates variable names when creating the new datafile, the results from each crosstabulation are used to create a new set of 4 variables (an example below). So, instead of there being 4 columns in the new dataset, I wind up with 4*k columns.

@.00_Low_Count
@.00_High_Count
@1.00_Low_Count
@1.00_High_Count
@.00_Low_Count_A
@.00_High_Count_A
@1.00_Low_Count_A
@1.00_High_Count_A

The structure of the dataset is at least somewhat correct (based on running the OMS with 'All dimensions in a single row' set). The only problem is that it's creating additional variables for each successive crosstab. Is there a clean way to get around this? Admittedly, I'm not too familiar with OMS aside from stuff I did a while back, so my apologies if this is a rather simplistic question. I referred to the OMS chapter in the user's manual; no dice. I tried playing around a bit with options under Utilities --> OMS Control Panel --> Options. Not much came about from that, either.

Thanks in advance,
Matt

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Peck, Jon

Re: Question: OMS with Crosstabs

There might be another way to do this, but here's one approach assuming that there are 4 y variables and one x variable.

First, OMS is used to pick out the crosstab tables and create a new dataset that is then activated.

Some useless variables are deleted along with the totals row, and finally a data restructure to put both rows of y values into a single case is carried out. The variable names indicate which columns are which.

dataset declare xtab.
oms select tables /if subtypes='Crosstabulation'/destination format=sav outfile=xtab /columns sequence=rall.
CROSSTABS /TABLES=x1 BY y1 y2 y3 y4.
omsend.
dataset activate xtab.
delete variables command_ subtype_ Label_ Var3 Total.
select if var1 <> "Total".
casestovars /id=var1 /groupby = var2.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matthew Reeder
Sent: Monday, June 09, 2008 4:55 PM
To: [hidden email]
Subject: [SPSSX-L] Question: OMS with Crosstabs

Hey list,

I'm converting output data (crosstabs) into SPSS datafiles for additional analysis using OMS. Each is a 2x2 crosstab (2 binary variables), where one of the variables (x1) is the same from one crosstabulation to the next, while the second variable in each crosstab is always different (y1, y2, y3, . . ., yk). So, the first crosstab would be x1 against y1, the second x1 against y2, and so on. x1 takes on the levels of 'low' and 'high,' while each of the y variables takes on the values of 0 or 1.

The resultant datafile that I'm creating would ideally be structured such that each of the k y-variables from the original dataset will correspond to a row (k rows altogether). Each row would contain 4 variables corresponding to the crosstabulation results based on that variable (since x and the y-variables are dichotomous, the variables in the resultant dataset would represent counts for each of Low-0, Low-1, High-0, High-1, respectively). So, the row corresponding to say, y1, would contain variables corresponding to the values of y1 being crosstabbed against x1, and so on and so forth up through yk. Simple enough.

Here's the problem: Ideally, I would like to have the dataset contain only 4 variables for all of the crosstabulation results. However, given how OMS creates variable names when creating the new datafile, the results from each crosstabulation are used to create a new set of 4 variables (an example below). So, instead of there being 4 columns in the new dataset, I wind up with 4*k columns.

@.00_Low_Count
@.00_High_Count
@1.00_Low_Count
@1.00_High_Count
@.00_Low_Count_A
@.00_High_Count_A
@1.00_Low_Count_A
@1.00_High_Count_A

The structure of the dataset is at least somewhat correct (based on running the OMS with 'All dimensions in a single row' set). The only problem is that it's creating additional variables for each successive crosstab. Is there a clean way to get around this? Admittedly, I'm not too familiar with OMS aside from stuff I did a while back, so my apologies if this is a rather simplistic question. I referred to the OMS chapter in the user's manual; no dice. I tried playing around a bit with options under Utilities --> OMS Control Panel --> Options. Not much came about from that, either.

Thanks in advance,
Matt

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD