Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Chelminski, Iwona
Hi Group,
I'm thinking about buying the 15 version. Is it any good? They want over $1,500 for it so i want to make sure that it's not a piece of crap like the version 10 was.
Any comments?
Thanks in advance

Iwona


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Automatic digest processor
Sent: Thursday, July 26, 2007 12:01 AM
To: Recipients of SPSSX-L digests
Subject: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)


There are 24 messages totalling 1629 lines in this issue.

Topics of the day:

  1. SPSS 15 loses reg. with Vista
  2. ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
  3. data dictionary (2)
  4. include (2)
  5. stepwise regression how to include all cases despite missing data (3)
  6. Random date generator (6)
  7. stepwise regression how to include all cases despite missing
     data
  8. Old Sunflower Option (2)
  9. AW:      Re: Old Sunflower Option
 10. Can someone please tell me how to unsubscribe from this forum, thanks in
     advance
 11. Can someone please tell me how to unsubscribe
 12. RENAME LOOP?
 13. Aggregating with missing data
 14. Matching files on one of three possible ID's

----------------------------------------------------------------------

Date:    Wed, 25 Jul 2007 04:36:06 -0400
From:    Nico Munting <[hidden email]>
Subject: Re: SPSS 15 loses reg. with Vista

I hope you have solved this problem by now, but if you haven't you might
try the following.

I do not have experience with SPSS on Vista, but my guess is that SPSS is
trying to store the registration information in C:\Program Files\ or the
equivalent in your situation. However, because in Vista you are not running
all programs as Administrator, SPSS does not have write-access to the
Program Files directory.

To give SPSS write-access to the Program Files directory right click on the
shortcut you are using to launch SPSS and select "Run As...". Choose an
account with Administrative privileges and you might be prompted for a
password or a confirmation. After this SPSS should start normally. Now
complete the registration, and since SPSS was started in Administrative
mode, it should be able to write the registration information to the
Program Files directory.

If all is well you should be able to start SPSS normally from this point
on, and hopefully it remembers your registration.

Good luck,

Nico


On Wed, 4 Jul 2007 06:33:47 +0100, David Hitchin <[hidden email]>
wrote:

>I have installed SPSS 15.0, including the patch to take it to 15.0.1 and
>the additional patch to cope with Vista problems. The new licence
>procedure works, and SPSS functions as expected.
>
>At some point, either when SPSS is stopped or the machine is turned off,
>the registration is lost; next time SPSS is started it has to be
>re-registered.
>
>Any ideas?
>
>David Hitchin

------------------------------

Date:    Wed, 25 Jul 2007 10:14:57 +0100
From:    Peter Watson <[hidden email]>
Subject: ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1804928587-1185354897=:4092
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


A reminder of the ASSESS meeting in November:

ASSESS: SPSS USERS' GROUP
21st ANNUAL MEETING
FRIDAY 9th NOVEMBER 2007
ALCUIN RESEARCH RESOURCE CENTRE AUDITORIUM=20
UNIVERSITY OF YORK, YORK


ASSESS is an independent user group for SPSS, a computer package for analys=
ing=20
and presenting data. It is run by users, for users and is completely indepe=
ndent
  of manufacturers of the software. The meeting is open to all users of SPS=
S and=20
to anyone interested in SPSS.

Come along to:
* hear SPSS users talk about applications,the problems and solutions
* hear the latest news from SPSS UK staff about product developments,
   and put your questions to them
* question a panel of experts about particular problems
* exchange ideas with other SPSS users
* plan for an even better user group.

The venue is the Alcuin Research Resource Centre (ARRC) on York University=
=20
campus located in Heslington, 2 miles to the south-east of the city centre.=
=20
It takes 10-15 minutes in a taxi from the railway station. The Number 4 bus=
=20
runs regularly to the University from York railway station (see=20
http://www.yorkshiretravel.net/). Parking at the University is very difficu=
lt.=20
Location details are at http://www.york.ac.uk/np/maps/. Accommodation is=20
bookable via tourist information on (01904) 621756 or (01904) 554455.

THE PROVISIONAL PROGRAMME**

* Welcome and introduction to meeting

* SPSS company and product news; SPSS software demonstrations

* How and why to document data for long-term storage, and what's special=20
about GI (geographical) data? by Allan Reese, CEFAS

* Making the world a better place with SPSS: analysing & predicting
charity donor behaviour using SPSS Base
by John Sauve-Rodd, Datapreneurs

* Applications of OMS by Gilbert MacKenzie, University of Limerick

* Multivariate aspects of testing the savannah hypothesis of shopping=20
by Charles Dennis, Brunel University

*  Mousing with SPSS: useful point and click=20
by Frances Provan, University of Edinburgh


* Users" Question Time and Clinic

* Annual General Meeting of ASSESS.

Registration and coffee will start at 10am. Papers and other events will ru=
n=20
from 10.30am to about 5.00pm. Morning coffee, lunch and afternoon tea are=
=20
included in the registration fee. A timetable will be e-mailed to delegates
  in advance of the meeting.




______________________
** The titles and order of events are subject to amendment.
---------------------------------------------------------------------------=
-----

                              BOOKING FORM
                        ASSESS : SPSS USERS' GROUP
         Friday 9th November 2007, ARRC auditorium, University of York

Important:

Bookings will not be treated as firm until a cheque or official (company) o=
rder,
payable to ASSESS, is received. Payment possible by BACS. Details on reques=
t.


Name:  ______________________________ Tel: ____________________
Email: ______________________________ Fax: ____________________

Job Title:    ___________________________________
Organization: ___________________________________
Address:      ___________________________________________________
               ___________________________________________________
               _______________________   Postcode  _______________

Strike out the sections which do not apply to you, or otherwise amend as
appropriate:

INDIVIDUAL BOOKING. Please reserve a place for me, at a cost of 65 GBP.

CORPORATE BOOKING. (Enter the appropriate amounts)

Please reserve ______ places, at a cost of =A3_____ (65 GBP, for the first =
person,=20
and 55 GBP for each subsequent person) .

Names of attendees : 1. _______________________________________
(for badges)         2. _______________________________________
                      3. _______________________________________
                      4. _______________________________________
                      5. _______________________________________

STUDENT (POST-GRADUATE) BOOKING. (Enclose photocopied evidence of status
for 2007-2008 academic year). Please reserve for me one of the student
places at a cost of 40 GBP.

Specify vegetarian or other dietary requirements, if any:
________________________________________________________________

Cheque or official order enclosed for _______GBP

For official orders please also give here the number and address for invoic=
ing:
________________________________________________________________
________________________________________________________________
________________________________________________________________


(Please indicate if you require a receipt of payment)

Return completed forms to: Peter Watson, MRC Cognition and Brain
Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF.

Telephone enquiries about bookings: 01223 355294 x801 (has an answerphone)

E-mail enquiries about bookings: =[hidden email]
(important: put "ASSESS" in the Subject field)

---559023410-1804928587-1185354897=:4092--

------------------------------

Date:    Wed, 25 Jul 2007 03:41:21 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: data dictionary

DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.


--- "Parry, James" <[hidden email]> wrote:

> Hi Ken,
>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
> -HTH
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 12:48:59 +0200
From:    Maddalena Agonigi <[hidden email]>
Subject: include

I am new to SPSS programming and so any help will be deeply appreciated.

I use INSERT in a Script

INSERT FILE=3D'C:\Test.sps'
  SYNTAX=3DBATCH Error=3DStop CD=3DYES.

but  spss give me following message

INSERT <qu=EC>FILE=3D'C:\Test.sps
(0) Istruzione non valida.

Thank you all

------------------------------

Date:    Wed, 25 Jul 2007 05:55:45 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: include

Hi,

--What SPSS version are you using? INSERT has been
implemented relatively recently (spss v13+ (?)). Did
you try using INCLUDE already?
--Are you absolutely sure the sps file is where you
say it is?

Albert-Jan

--- Maddalena Agonigi <[hidden email]> wrote:

> I am new to SPSS programming and so any help will be
> deeply appreciated.
>
> I use INSERT in a Script
>
> INSERT FILE='C:\Test.sps'
>   SYNTAX=BATCH Error=Stop CD=YES.
>
> but  spss give me following message
>
> INSERT <quì>FILE='C:\Test.sps
> (0) Istruzione non valida.
>
> Thank you all
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 09:49:25 -0400
From:    Ken Wood <[hidden email]>
Subject: Re: data dictionary

Thank you for the many suggestions.  For those interested, the suggestions I received are below.






you can go to File Menu and choose the option display information
about the current working file or an external file. You will have an output
that you can save in .spo format or even copy to word, excel or any other
software.




Do Display Dictionary, then go to the Output, select the information
displayed, and either print from there, or export to either RTF or
Excel formats. Recent versions of SPSS provide information on
variable names and structure separately from Value labels, which is
a bit annoying.



try this command in syntax:

display dictionary.




DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.




*OMS
*  /SELECT TABLES
*  /EXCEPTIF LABELS =['Notes']
*  /DESTINATION
*        Format = Text
*        OUTFILE = "dictionary.txt".
*DISPLAY DICTIONARY.
*OMSEND.



>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Ken Wood, PhD
Research Scientist
KMRREC
West Orange, NJ
973-243-6871

------------------------------

Date:    Wed, 25 Jul 2007 21:58:49 +0800
From:    Hunna Watson <[hidden email]>
Subject: stepwise regression how to include all cases despite missing data

Hi all,=20
=20
I'm running a stepwise regression of organizational practices on =
construction projects that predict project cost growth. I have data for =
115 projects, yet some organizational practices were not applicable on =
some projects (in a random fashion). the missing data is obviously =
purposeful and not due to not filling in questionnaires etc. spss =
automatically excludes cases with any missing values, or wants to =
substitute a value, so I end up with a regression being carried out on =
10 projects, obviously not useful. Any suggestions for syntax to include =
all cases or suggestions to rectify this problem?
=20
Thanks in advance,=20
Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 09:12:37 -0500
From:    Melissa Ives <[hidden email]>
Subject: Re: Random date generator

During the assessment process, interviewers are given these instructions
for estimating a date when a client cannot remember specific days or
months. Perhaps you could create a similar algorithm?

Date Guidelines (d/e):  Use the following rules if the participant is
unsure of the exact date:
DAY: Use the 5th for the beginning of the month, 15th for the middle of
the month, and 25th for the end of the month.
MONTH: Use March for early in the year, July for middle of the year, and
October for later in the year, but try to make it so the number of weeks
is about right.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Syed Hashmi
Sent: Thursday, July 19, 2007 9:48 PM
To: [hidden email]
Subject: [SPSSX-L] Random date generator

Dear co-listers,

A dataset that I'm analyzing has a set of dates for events (start and
stop
dates) as well as how long those events occured for.  The data for each
date is in three variables (month, day, year). The years are pretty
complete if they are filled in but the month and day might are sometimes
listed as the exact month or date and other times they're listed as
beginning, middle or end of the year (for the month variable) or the
month (for the day variable).

Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
duration) from which I can deduce the start and stop dates (startdt,
stopdt).
Unfortunately,  I have the complete start and stop date for about half
the cases. The rest are missing either parts of one of the dates (eg.
day) or for both.  If I have one of the dates and a duration, I can
calculate the other date.

The reason for this post is that there is a small subset of the
population where I have the complete stop date but am missing the start
day (I have the year and month) and am also missing the duration.  I had
to come up with some way to impute a start date for these cases for
analysis (which will be done with and without these specific cases).  I
know that the event could not be more than a month long. Therefore, what
I was planning on doing was based on the information I have, calculate
the earliest possible start date (e_startdt) up to a month before the
stop date and then randomly pick a date between e_startdt and the stop
date.

Therefore, my query here was this: how can I code for this. I have an
idea of how to do it in SAS but since I'm working in SPSS that doesn't
help much.  I'm assuming that it will be something simple like:

     startdt = e_startdt + RANDOM_DAYS.

where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
startdt, "days").

So how would I go about doing this? I tried using the help files and all
but couldn't come up with something that worked. Is this the best way to
do this? Any other way that I can do this? Does it matter what kind of
seeding I use for the random number generator?

Thanks.

- Shahrukh


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 11:13:35 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 10:24:27 -0400
From:    Mark A Davenport MADAVENP <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

Hector,

When I worked at ACT, Inc we often treated the student's school identifier
this way, usually with great success.  Granted, we had many thousands of
cases to draw from.  Hunna only has 115?  She is going to run out of cases
pretty quickly, don't you think?


***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Hector Maletta <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/25/2007 10:15 AM
Please respond to
Hector Maletta <[hidden email]>


To
[hidden email]
cc

Subject
Re: stepwise regression how to include all cases despite missing    data






         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:36:49 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
         data

Hunna:

Even with a scale, the "missing" responses can be reinterpreted as saying
"This practice was not effective in this case because -for one reason or
another- it was not used". This is not quite clean conceptually, but is your
only choice unless you put up with working with 10 cases.

The problem, apparently, is in the design of the questionnaire, asking for
the effectiveness of a practice that is not in universal use among the cases
under analysis. In any case, a practice cannot have any effectiveness if it
is not used, so I insist you can treat it as having zero effectiveness when
it was not used.

On the other hand, since your variables seem to be many, and your cases seem
to be few, perhaps you should consider a more artisanal approach for
identifying effective strategies instead of your alli-in-one regression.
With 87 predictors and 115 cases you don't have a chance even without a
single missing value.



Hector







  _____

From: Hunna Watson [mailto:[hidden email]]
Sent: 25 July 2007 11:36
To: Hector Maletta
Subject: FW: Re: stepwise regression how to include all cases despite
missing data



thanks for your reply, i've just come on board this project in the past two
weeks, the data has been collected already and, this is essentially what
happened though i'm simplifying it, respondents rated how effective the use
of the strategy was for preventing cost growth in the form of work that had
to be done again on the project, so I have data on a scale and no
possibility for coding absent or present :S



extra information....



yes I know all the horrible things about stepwise, but it is the only
suitable method I can think of to answer the research questions, I have just
come on board the project in the last two weeks. The research is very
exploratory and the topic hasn't been examined before. Data has been
collected on many different predictor variables (design-related sources,
subcontractor sources, site management sources, contract documentation, the
list goes on and on - up to a terrible 87 predictors). There are 115
projects, so each is a case if you like, and we want to first look at this
data set (no options there), but after that we can merge it with another
data set containing information on a further 160 projects. Some predictors
weren't relevant to projects. for instance, some didn't use incentives, but
we have ratings on scales of 1 to 5 (assessing raters perceptions of
contribution of use of that method to costs) and we are seeking to predict
costs from the predictor variables. IF a method wasn't applicable e.g., use
of a particular incentive plan, it has been left blank on questionnaires. No
logical ordering.

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:48:56 -0400
From:    "William B. Ware" <[hidden email]>
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 10:31:24 -0600
From:    ViAnn Beadle <[hidden email]>
Subject: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the
binhex function in GPL. It groups together nearby points and then you use
the summary.count function to count the number of hits within the bin to set
the size of the point displayed. This produces a plot sometimes referred to
as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee
data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel
points which look like dust on my monitor--so this gets around that. The
color.interior fills in the points (defaults to circles). I think the
default hollow points are ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
size(summary.count()), color.interior(color.blue))
END GPL.

The IGRAPH procedure also provides a jittering option which nudges the
points slightly apart by adding a small amount of random variation, but I
think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 18:39:32 +0200
From:    Georg Maubach <[hidden email]>
Subject: AW:      Re: Old Sunflower Option

Hi All,

We tried to run the syntax example on our "Employee data.sav" file. As we have a German version the variable names were translated to German. Does anybody know how we could obtain sample files in English?

Best regards

Georg Maubach
Research Manager


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von ViAnn Beadle
Gesendet: Mittwoch, 25. Juli 2007 18:31
An: [hidden email]
Betreff: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the binhex function in GPL. It groups together nearby points and then you use the summary.count function to count the number of hits within the bin to set the size of the point displayed. This produces a plot sometimes referred to as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel points which look like dust on my monitor--so this gets around that. The color.interior fills in the points (defaults to circles). I think the default hollow points arre ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))), size(summary.count()), color.interior(color.blue)) END GPL.

The IGRAPH procedure also provides a jittering option which nudges the points slightly apart by adding a small amount of random variation, but I think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower option to show multiple cases at the same point in a scatter plot.  Does anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 13:28:22 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Melissa,

I'm already doing something similar to what you said, using 1st, 10th
and 20th as the dates for start, middle and end of the month.  The Month
is a bit trickier since my exposures are events during pregnancy so I
have to be careful about just assigning a random month lest it falls
outside the pregnancy duration.

They question I had asked concerned dates which had month and year but
no day information - not even beginning, middle or end.  Therefore, I
couldn't even depend on the 1st-10th-20th coding.

My final solution for the problem where I had a start date and had a
stop month was to pick a random date between the start date (or the
first date of the stop month) and the last day of the stop month.  Gene
Maguin had emailed me earlier and suggested I use the UNIFORM function
to randomly select a date.  I don't think that message was posted on the
list-serv, so the body is copied below.  I've used the function since
and it works nicely.

Thanks again for your help though. I agree that there should be some
sort of algorithm in place at the interviewer level to minimize the
frequency of incomplete data.

- S. Hashmi

*copy of email from Gene*

> -----Original Message-----
> From: Gene Maguin [mailto:[hidden email]]
> Sent: Friday, July 20, 2007 8:05 AM
> To: Hashmi, Syed S
> Subject: RE: Random date generator
>
> Syed,
>
> I'd like to be helpful to you but I don't have time to make up a full
> solution. I think this would be a valid example of your question.
>
> Start date (mm/dd/yyyy): 5/x/2004
> Stop date (mm/dd/yyyy):  6/17/2004
> Possible duration range (6/17/2004)-(5/31/2004)=17 days to
> (6/17/2004)-(5/18/2004)=30 days (I assume a 30 day month)
>
> So x has to be between 18 and 31 inclusive.
>
> So I think the trick to the random draw is this command.
>
> Compute x=uniform(14).
> Compute x=trunc(x).
>
> Check this but I'm pretty sure that the range of x will be 0 to 13.
> Your actual date is then > Compute x=x+18.
>
> There's lots of big 'little bits' to tidy up but this will get you
want
> you > want when the tidying up has been done.
>
> Best wishes, Gene Maguin
>


> -----Original Message-----
> From: Melissa Ives [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 9:13 AM
> To: Hashmi, Syed S; [hidden email]
> Subject: RE: [SPSSX-L] Random date generator
>
> During the assessment process, interviewers are given these
instructions
> for estimating a date when a client cannot remember specific days or
> months. Perhaps you could create a similar algorithm?
>
> Date Guidelines (d/e):  Use the following rules if the participant is
> unsure of the exact date:
> DAY: Use the 5th for the beginning of the month, 15th for the middle
of
> the month, and 25th for the end of the month.
> MONTH: Use March for early in the year, July for middle of the year,
and
> October for later in the year, but try to make it so the number of
weeks
> is about right.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Syed Hashmi
> Sent: Thursday, July 19, 2007 9:48 PM
> To: [hidden email]
> Subject: [SPSSX-L] Random date generator
>
> Dear co-listers,
>
> A dataset that I'm analyzing has a set of dates for events (start and
> stop
> dates) as well as how long those events occured for.  The data for
each
> date is in three variables (month, day, year). The years are pretty
> complete if they are filled in but the month and day might are
sometimes

> listed as the exact month or date and other times they're listed as
> beginning, middle or end of the year (for the month variable) or the
> month (for the day variable).
>
> Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> duration) from which I can deduce the start and stop dates (startdt,
> stopdt).
> Unfortunately,  I have the complete start and stop date for about half
> the cases. The rest are missing either parts of one of the dates (eg.
> day) or for both.  If I have one of the dates and a duration, I can
> calculate the other date.
>
> The reason for this post is that there is a small subset of the
> population where I have the complete stop date but am missing the
start
> day (I have the year and month) and am also missing the duration.  I
had
> to come up with some way to impute a start date for these cases for
> analysis (which will be done with and without these specific cases).
I
> know that the event could not be more than a month long. Therefore,
what

> I was planning on doing was based on the information I have, calculate
> the earliest possible start date (e_startdt) up to a month before the
> stop date and then randomly pick a date between e_startdt and the stop
> date.
>
> Therefore, my query here was this: how can I code for this. I have an
> idea of how to do it in SAS but since I'm working in SPSS that doesn't
> help much.  I'm assuming that it will be something simple like:
>
>      startdt = e_startdt + RANDOM_DAYS.
>
> where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> startdt, "days").
>
> So how would I go about doing this? I tried using the help files and
all
> but couldn't come up with something that worked. Is this the best way
to

> do this? Any other way that I can do this? Does it matter what kind of
> seeding I use for the random number generator?
>
> Thanks.
>
> - Shahrukh
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 15:05:07 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Random date generator

Somehow I missed or deleted the original posting in this thread.
Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,

>A dataset that I'm analyzing has a set of dates for events (start and
>stop dates) as well as how long those events occured for.  The data
>for each date is in three variables (month, day, year). The years are
>pretty complete if they are filled in but the month and day might are
>sometimes listed as the exact month or date and other times they're
>listed as beginning, middle or end of the year (for the month
>variable) or the month (for the day variable).
>
>I have [two dates as three variables each, plus a duration] duration).
>I have the complete start and stop date for about half the cases. The
>rest are missing either parts of one of the dates (eg. day) or for
>both.  If I have one of the dates and a duration, I can calculate the
>other date.

So far, so good, though be careful about how precise your 'durations'
are.

>There is a small subset of the population where I have the complete
>stop date but am missing the start day (I have the year and month) and
>am also missing the duration.  I had to come up with some way to
>impute a start date for these cases for analysis. (which will be done
>with and without these specific cases).  I know that the event could
>not be more than a month long. I was planning calculate the earliest
>possible start date (e_startdt) up to a month before the stop date and
>then randomly pick a date between e_startdt and the stop date.

OUCH! I would not do this. Period.

*MAYBE* the start dates and durations you get this way will be vaguely
representative of the population of events, though I doubt it. Are your
durations roughly uniformly distributed from 0 to 30 days? For goodness
sake, you ought to check that before proceeding.

But even if they're representative of the population, they have nothing
to do with the individual cases for which they're 'imputed'. No
analysis using those 'dates' will be the least trustworthy.

A far better approach is to use true missing-value interpolation on the
*durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
durations you'd have to impute. If it's near 50%, that won't be at all
reliable, either.

-Good luck,
  Richard

------------------------------

Date:    Wed, 25 Jul 2007 15:09:53 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 2:05 PM
>
> >There is a small subset of the population where I have the complete
> >stop date but am missing the start day (I have the year and month)
and
> >am also missing the duration.  I had to come up with some way to
> >impute a start date for these cases for analysis. (which will be done
> >with and without these specific cases).  I know that the event could
> >not be more than a month long. I was planning calculate the earliest
> >possible start date (e_startdt) up to a month before the stop date
and
> >then randomly pick a date between e_startdt and the stop date.
>
> OUCH! I would not do this. Period.
>
> *MAYBE* the start dates and durations you get this way will be vaguely
> representative of the population of events, though I doubt it. Are
your
> durations roughly uniformly distributed from 0 to 30 days? For
goodness
> sake, you ought to check that before proceeding.
>
> But even if they're representative of the population, they have
nothing
> to do with the individual cases for which they're 'imputed'. No
> analysis using those 'dates' will be the least trustworthy.
>
> A far better approach is to use true missing-value interpolation on
the
> *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> durations you'd have to impute. If it's near 50%, that won't be at all
> reliable, either.
>
> -Good luck,
>   Richard


Richard,

Thanks for your input.  I realize that I was stepping into extremely
treacherous territory when I decide to impute dates and select random
ones.  As for the durations being roughly uniformly distributed, that's
what it looks like from the data I do have.  Initially, I'd assumed that
durations would have a mean of about 7 days but somehow the data I do
have doesn't seem to show that.  It's more or less uniformly
distributed.  There were some durations that were >30 days but I doubt
if they're true.  Therefore, I decided to go ahead with the uniform
distribution (although, the whole imputation and random selection still
bothers me).

The reason that I'm trying to get an idea about the dates, especially
the event start dates, is due to the nature of the study question. I'm
looking at the occurrence of certain events during pregnancy.  However,
these events of interest have to occur within the first trimester, or if
I narrow it down further, the first two months of pregnancy.  Therefore,
I have to know if an event occurred within a certain period of time
after the last menstrual date as reported by the woman.  At the end of
the day, the variables for all the events get filtered down to a single
dichotomous variable - Y/N did the event occur during the period of
interest?

I will do the analysis with and without the cases where the dates have
been imputed from incomplete data.  I hadn't previously thought of using
true-missing value interpolation on the durations but I'll look into it.
I've never done that before so will have to read up a bit on it.  I
might have an issue with number of missings though, since more cases
have at least some part of the date then a duration value.

Thanks again for your advice. It's always nice to get a fresh look at an
issue.

- Shahurkh

------------------------------

Date:    Wed, 25 Jul 2007 16:24:09 -0400
From:    Gene Maguin <[hidden email]>
Subject: Re: Random date generator

Syed,

It sounds like you are going to use the imputed dates to decide if something
happended or not. The new variable, 'something happened or not' might be a
dependent variable or it might be an independent variable. There's a
literature on estimating relationships in the presence of missing data. To
correctly estimate relationships (or, at least, come very close), you should
use either multiple imputation or a maximum likelihood estimation method
that incorporates the EM algorithm. So far as I know, SPSS has neither. The
key person here is Donald Rubin. But, there are other, more recent articles.

Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 20:43:06 +0000
From:    Hamish Travers <[hidden email]>
Subject: Can someone please tell me how to unsubscribe from this forum,
         thanks in advance

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Space=
s. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=3Dcreate&wx_url=3D/friends.=
aspx&mkt=3Den-us=

------------------------------

Date:    Wed, 25 Jul 2007 16:41:18 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Gene,

After the comments that you and Richard made I'm thinking real hard of
rethinking the whole thing.  Maximum likelihood estimation was something
that I had thought of initially but didn't follow up on.  I guess it's
time that I do.  Thanks again for your help.

- Shahrukh


> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Gene Maguin
> Sent: Wednesday, July 25, 2007 3:24 PM
> To: [hidden email]
> Subject: Re: Random date generator
>
> Syed,
>
> It sounds like you are going to use the imputed dates to decide if
> something
> happended or not. The new variable, 'something happened or not' might
be a
> dependent variable or it might be an independent variable. There's a
> literature on estimating relationships in the presence of missing
data. To
> correctly estimate relationships (or, at least, come very close), you
> should
> use either multiple imputation or a maximum likelihood estimation
method
> that incorporates the EM algorithm. So far as I know, SPSS has
neither.
> The
> key person here is Donald Rubin. But, there are other, more recent
> articles.
>
> Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 18:33:35 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Can someone please tell me how to unsubscribe

FAQ: How to unsubscribe, or leave list SPSSX-L:

Requests to unsubscribe that are posted to the list, will never be
acted on.

You must send the request to [hidden email].

 From the E-mail address from which you're subscribed to the list, send
a message to [hidden email] with the following words in the
body of the message:

SIGNOFF SPSSX-L

Don't put anything else (your name, etc.) in the body of the message.

It should work. If it doesn't, go to the following Web page:

http://www.listserv.uga.edu/cgi-bin/wa?SUBED1=spssx-l&A=1

and unsubscribe from there.

...........................
More information:

When you subscribed to the list, you received a welcome message (I'm
copying it below) with instructions (including asking you to save it).

 From the welcome message:

>Your  subscription to  the SPSSX-L  list (SPSSX(r)  Discussion)
>has  been accepted.
>
>Please save this message for future  reference, [...]
>
>You may leave the list at any time by sending a "SIGNOFF SPSSX-L"
>command to [hidden email].

There are many other commands that can be sent to the same address, to
manage your subscription. If you send mail to [hidden email]
with the text

INFO REFCARD

and no other text, you will be mailed a file describing those commands.

------------------------------

Date:    Wed, 25 Jul 2007 16:00:22 -0700
From:    Karen Powers <[hidden email]>
Subject: RENAME LOOP?

Hello SPSS list,
I have a dataset in which the variable names are var002, var003   ...
var3477.

I would like to RENAME each variable with the number listed in row 1 of
each respective column.
The first number in var002 is 6951030.  I would like var002 renamed
"rs6951030".
The command for doing this once that runs nicely is:

RENAME VARS var002 = "rs"+ "6951030".
EXE.

Now I would like to do this for all vars through var3477.
I have tried DO REPEAT and LOOP commands but they both say
 >Warning # 141.  Command name: RENAME VARS
 >DO REPEAT has no effect on this command.

Any ideas on how I can do this (3477 times)?

Thanks, Karen

------------------------------

Date:    Wed, 25 Jul 2007 19:27:19 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Aggregating with missing data

At 03:30 AM 7/14/2007, Marco wrote:

>When using the aggregate (mean) function in SPSS, cells that contain
>missing data become empty. Thus, a cell that should contain the mean
>of multiple cells (one of which is empty/missing), turns to zero
>because it contains one missing datum.

I'm not sure what's happening to you, but you shouldn't be seeing what
you say you're seeing.

 From your description, it sounds like you're doing one of two things:
a) Using the MEAN function with command AGGREGATE to average over a set
of variables
b) Using the MEAN function in the transformation language to average
over a set of variables.

BOTH of those, however, ignore missing values when averaging, and take
the mean of the non-missing values; they don't make a value 0 because
there's a missing value in the list. (That would be a very dangerous
thing to do anyway.) So I'm not sure what's happening.

Could you post the syntax, some test data, what output you get, and
tell us what output you want?

.....................................
Here are demonstrations of averaging across cases with AGGREGATE, and
averaging across variables. It's SPSS 15 draft output (WRR-not saved
separately).
.....................................
Using AGGREGATE to average over cases:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
[Aggregate]

Group Value

     1     1
     1     2
     1     3
     2     4
     2     5
     2     .
     2     7
     3     .
     3     9
     3    10


Number of cases read:  10    Number of cases listed:  10


AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
Group Members     MEAN

     1       3     2.00
     2       4     5.33
     3       3     9.50

Number of cases read:  3    Number of cases listed:  3
.....................................
Using MEAN to average over variables:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:04       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4

     1      3       1       2       3       .
     2      4       4       5       .       7
     3      3       .       9      10       .


Number of cases read:  3    Number of cases listed:  3


NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:26       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4   Mean

     1      3       1       2       3       .    2.00
     2      4       4       5       .       7    5.33
     3      3       .       9      10       .    9.50

Number of cases read:  3    Number of cases listed:  3

===================
APPENDIX:  All code
===================
I keyed the test data into the Data Editor; however, it can be
recovered from the LIST output fairly easily. Here's all the code:

DATASET ACTIVATE TestData.
DATASET COPY     Aggregate.
DATASET ACTIVATE Aggregate WINDOW=FRONT.

LIST.

AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

DATASET ACTIVATE TestData.
DATASET COPY     Wide.
DATASET ACTIVATE Wide      WINDOW=FRONT.

SORT CASES BY Group .
CASESTOVARS
  /ID = Group
  /GROUPBY = VARIABLE
  /COUNT = Members "Size of group" .

LIST.

NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).

LIST.

------------------------------

Date:    Wed, 25 Jul 2007 20:24:54 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Matching files on one of three possible ID's

At 09:29 AM 7/12/2007, Daniel Robertson wrote:

>I have two database extracts that I am trying to merge, one of which
>contains enrolled students and the other contains approximately the
>same group of students when they were applicants. In the Enrollment
>file students are uniquely identified by 'enroll_id'. In the Applicant
>file there is a primary ID, 'applicant_id1', but there may be up to
>two other IDs which were issued and updated provisionally as the
>student was going through the application process. The rub is that
>'enroll_id' may match any one of the applicant IDs, not necessarily
>the primary one.

Gene's given you a workable solution. It requires sorting the data
three times;  but with three keys, something like that is inevitable.

You *can* combine the three sorting operations into one step, by using
XSAVE to create three copies of each Applicant record, one each in
which 'enroll_id' is loaded from each of the three candidate key
variables in the Applicant file. Then sort the resulting file by that
'enroll_id', MATCH FILES with the Enrollment file, and discard any
Applicant records that don't match.

Now, that's the simplest possible case. You may need logic in case,
say, the same ID value occurs in more than one of the Applicant-record
fields. But it's another way to go.

Sorry; no code this time.

------------------------------

End of SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
**************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Mark A Davenport MADAVENP
IMHO 15 is far superior to version 10, but a bit more quirky than 14 with
the patch.  I have had no problems with my patched versions of 15: one
networked here at the office and one stand-alone at the house.

Keep in mind that 16 is scheduled to come out in Fall.  You might want to
wait.

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






"Chelminski, Iwona" <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/26/2007 11:52 AM
Please respond to
"Chelminski, Iwona" <[hidden email]>


To
[hidden email]
cc

Subject
Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)






Hi Group,
I'm thinking about buying the 15 version. Is it any good? They want over
$1,500 for it so i want to make sure that it's not a piece of crap like
the version 10 was.
Any comments?
Thanks in advance

Iwona


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Automatic digest processor
Sent: Thursday, July 26, 2007 12:01 AM
To: Recipients of SPSSX-L digests
Subject: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)


There are 24 messages totalling 1629 lines in this issue.

Topics of the day:

  1. SPSS 15 loses reg. with Vista
  2. ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
  3. data dictionary (2)
  4. include (2)
  5. stepwise regression how to include all cases despite missing data (3)
  6. Random date generator (6)
  7. stepwise regression how to include all cases despite missing
     data
  8. Old Sunflower Option (2)
  9. AW:      Re: Old Sunflower Option
 10. Can someone please tell me how to unsubscribe from this forum, thanks
in
     advance
 11. Can someone please tell me how to unsubscribe
 12. RENAME LOOP?
 13. Aggregating with missing data
 14. Matching files on one of three possible ID's

----------------------------------------------------------------------

Date:    Wed, 25 Jul 2007 04:36:06 -0400
From:    Nico Munting <[hidden email]>
Subject: Re: SPSS 15 loses reg. with Vista

I hope you have solved this problem by now, but if you haven't you might
try the following.

I do not have experience with SPSS on Vista, but my guess is that SPSS is
trying to store the registration information in C:\Program Files\ or the
equivalent in your situation. However, because in Vista you are not
running
all programs as Administrator, SPSS does not have write-access to the
Program Files directory.

To give SPSS write-access to the Program Files directory right click on
the
shortcut you are using to launch SPSS and select "Run As...". Choose an
account with Administrative privileges and you might be prompted for a
password or a confirmation. After this SPSS should start normally. Now
complete the registration, and since SPSS was started in Administrative
mode, it should be able to write the registration information to the
Program Files directory.

If all is well you should be able to start SPSS normally from this point
on, and hopefully it remembers your registration.

Good luck,

Nico


On Wed, 4 Jul 2007 06:33:47 +0100, David Hitchin
<[hidden email]>
wrote:

>I have installed SPSS 15.0, including the patch to take it to 15.0.1 and
>the additional patch to cope with Vista problems. The new licence
>procedure works, and SPSS functions as expected.
>
>At some point, either when SPSS is stopped or the machine is turned off,
>the registration is lost; next time SPSS is started it has to be
>re-registered.
>
>Any ideas?
>
>David Hitchin

------------------------------

Date:    Wed, 25 Jul 2007 10:14:57 +0100
From:    Peter Watson <[hidden email]>
Subject: ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware
tools.

---559023410-1804928587-1185354897=:4092
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


A reminder of the ASSESS meeting in November:

ASSESS: SPSS USERS' GROUP
21st ANNUAL MEETING
FRIDAY 9th NOVEMBER 2007
ALCUIN RESEARCH RESOURCE CENTRE AUDITORIUM=20
UNIVERSITY OF YORK, YORK


ASSESS is an independent user group for SPSS, a computer package for
analys=
ing=20
and presenting data. It is run by users, for users and is completely
indepe=
ndent
  of manufacturers of the software. The meeting is open to all users of
SPS=
S and=20
to anyone interested in SPSS.

Come along to:
* hear SPSS users talk about applications,the problems and solutions
* hear the latest news from SPSS UK staff about product developments,
   and put your questions to them
* question a panel of experts about particular problems
* exchange ideas with other SPSS users
* plan for an even better user group.

The venue is the Alcuin Research Resource Centre (ARRC) on York
University=
=20
campus located in Heslington, 2 miles to the south-east of the city
centre.=
=20
It takes 10-15 minutes in a taxi from the railway station. The Number 4
bus=
=20
runs regularly to the University from York railway station (see=20
http://www.yorkshiretravel.net/). Parking at the University is very
difficu=
lt.=20
Location details are at http://www.york.ac.uk/np/maps/. Accommodation
is=20
bookable via tourist information on (01904) 621756 or (01904) 554455.

THE PROVISIONAL PROGRAMME**

* Welcome and introduction to meeting

* SPSS company and product news; SPSS software demonstrations

* How and why to document data for long-term storage, and what's
special=20
about GI (geographical) data? by Allan Reese, CEFAS

* Making the world a better place with SPSS: analysing & predicting
charity donor behaviour using SPSS Base
by John Sauve-Rodd, Datapreneurs

* Applications of OMS by Gilbert MacKenzie, University of Limerick

* Multivariate aspects of testing the savannah hypothesis of shopping=20
by Charles Dennis, Brunel University

*  Mousing with SPSS: useful point and click=20
by Frances Provan, University of Edinburgh


* Users" Question Time and Clinic

* Annual General Meeting of ASSESS.

Registration and coffee will start at 10am. Papers and other events will
ru=
n=20
from 10.30am to about 5.00pm. Morning coffee, lunch and afternoon tea are=
=20
included in the registration fee. A timetable will be e-mailed to
delegates
  in advance of the meeting.




______________________
** The titles and order of events are subject to amendment.
---------------------------------------------------------------------------=
-----

                              BOOKING FORM
                        ASSESS : SPSS USERS' GROUP
         Friday 9th November 2007, ARRC auditorium, University of York

Important:

Bookings will not be treated as firm until a cheque or official (company)
o=
rder,
payable to ASSESS, is received. Payment possible by BACS. Details on
reques=
t.


Name:  ______________________________ Tel: ____________________
Email: ______________________________ Fax: ____________________

Job Title:    ___________________________________
Organization: ___________________________________
Address:      ___________________________________________________
               ___________________________________________________
               _______________________   Postcode  _______________

Strike out the sections which do not apply to you, or otherwise amend as
appropriate:

INDIVIDUAL BOOKING. Please reserve a place for me, at a cost of 65 GBP.

CORPORATE BOOKING. (Enter the appropriate amounts)

Please reserve ______ places, at a cost of =A3_____ (65 GBP, for the first
=
person,=20
and 55 GBP for each subsequent person) .

Names of attendees : 1. _______________________________________
(for badges)         2. _______________________________________
                      3. _______________________________________
                      4. _______________________________________
                      5. _______________________________________

STUDENT (POST-GRADUATE) BOOKING. (Enclose photocopied evidence of status
for 2007-2008 academic year). Please reserve for me one of the student
places at a cost of 40 GBP.

Specify vegetarian or other dietary requirements, if any:
________________________________________________________________

Cheque or official order enclosed for _______GBP

For official orders please also give here the number and address for
invoic=
ing:
________________________________________________________________
________________________________________________________________
________________________________________________________________


(Please indicate if you require a receipt of payment)

Return completed forms to: Peter Watson, MRC Cognition and Brain
Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF.

Telephone enquiries about bookings: 01223 355294 x801 (has an answerphone)

E-mail enquiries about bookings: =[hidden email]
(important: put "ASSESS" in the Subject field)

---559023410-1804928587-1185354897=:4092--

------------------------------

Date:    Wed, 25 Jul 2007 03:41:21 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: data dictionary

DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.


--- "Parry, James" <[hidden email]> wrote:

> Hi Ken,
>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
> -HTH
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of
results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're
surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 12:48:59 +0200
From:    Maddalena Agonigi <[hidden email]>
Subject: include

I am new to SPSS programming and so any help will be deeply appreciated.

I use INSERT in a Script

INSERT FILE=3D'C:\Test.sps'
  SYNTAX=3DBATCH Error=3DStop CD=3DYES.

but  spss give me following message

INSERT <qu=EC>FILE=3D'C:\Test.sps
(0) Istruzione non valida.

Thank you all

------------------------------

Date:    Wed, 25 Jul 2007 05:55:45 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: include

Hi,

--What SPSS version are you using? INSERT has been
implemented relatively recently (spss v13+ (?)). Did
you try using INCLUDE already?
--Are you absolutely sure the sps file is where you
say it is?

Albert-Jan

--- Maddalena Agonigi <[hidden email]> wrote:

> I am new to SPSS programming and so any help will be
> deeply appreciated.
>
> I use INSERT in a Script
>
> INSERT FILE='C:\Test.sps'
>   SYNTAX=BATCH Error=Stop CD=YES.
>
> but  spss give me following message
>
> INSERT <quì>FILE='C:\Test.sps
> (0) Istruzione non valida.
>
> Thank you all
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of
results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're
surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 09:49:25 -0400
From:    Ken Wood <[hidden email]>
Subject: Re: data dictionary

Thank you for the many suggestions.  For those interested, the suggestions
I received are below.






you can go to File Menu and choose the option display information
about the current working file or an external file. You will have an
output
that you can save in .spo format or even copy to word, excel or any other
software.




Do Display Dictionary, then go to the Output, select the information
displayed, and either print from there, or export to either RTF or
Excel formats. Recent versions of SPSS provide information on
variable names and structure separately from Value labels, which is
a bit annoying.



try this command in syntax:

display dictionary.




DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.




*OMS
*  /SELECT TABLES
*  /EXCEPTIF LABELS =['Notes']
*  /DESTINATION
*        Format = Text
*        OUTFILE = "dictionary.txt".
*DISPLAY DICTIONARY.
*OMSEND.



>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Ken Wood, PhD
Research Scientist
KMRREC
West Orange, NJ
973-243-6871

------------------------------

Date:    Wed, 25 Jul 2007 21:58:49 +0800
From:    Hunna Watson <[hidden email]>
Subject: stepwise regression how to include all cases despite missing data

Hi all,=20
=20
I'm running a stepwise regression of organizational practices on =
construction projects that predict project cost growth. I have data for =
115 projects, yet some organizational practices were not applicable on =
some projects (in a random fashion). the missing data is obviously =
purposeful and not due to not filling in questionnaires etc. spss =
automatically excludes cases with any missing values, or wants to =
substitute a value, so I end up with a regression being carried out on =
10 projects, obviously not useful. Any suggestions for syntax to include =
all cases or suggestions to rectify this problem?
=20
Thanks in advance,=20
Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 09:12:37 -0500
From:    Melissa Ives <[hidden email]>
Subject: Re: Random date generator

During the assessment process, interviewers are given these instructions
for estimating a date when a client cannot remember specific days or
months. Perhaps you could create a similar algorithm?

Date Guidelines (d/e):  Use the following rules if the participant is
unsure of the exact date:
DAY: Use the 5th for the beginning of the month, 15th for the middle of
the month, and 25th for the end of the month.
MONTH: Use March for early in the year, July for middle of the year, and
October for later in the year, but try to make it so the number of weeks
is about right.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Syed Hashmi
Sent: Thursday, July 19, 2007 9:48 PM
To: [hidden email]
Subject: [SPSSX-L] Random date generator

Dear co-listers,

A dataset that I'm analyzing has a set of dates for events (start and
stop
dates) as well as how long those events occured for.  The data for each
date is in three variables (month, day, year). The years are pretty
complete if they are filled in but the month and day might are sometimes
listed as the exact month or date and other times they're listed as
beginning, middle or end of the year (for the month variable) or the
month (for the day variable).

Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
duration) from which I can deduce the start and stop dates (startdt,
stopdt).
Unfortunately,  I have the complete start and stop date for about half
the cases. The rest are missing either parts of one of the dates (eg.
day) or for both.  If I have one of the dates and a duration, I can
calculate the other date.

The reason for this post is that there is a small subset of the
population where I have the complete stop date but am missing the start
day (I have the year and month) and am also missing the duration.  I had
to come up with some way to impute a start date for these cases for
analysis (which will be done with and without these specific cases).  I
know that the event could not be more than a month long. Therefore, what
I was planning on doing was based on the information I have, calculate
the earliest possible start date (e_startdt) up to a month before the
stop date and then randomly pick a date between e_startdt and the stop
date.

Therefore, my query here was this: how can I code for this. I have an
idea of how to do it in SAS but since I'm working in SPSS that doesn't
help much.  I'm assuming that it will be something simple like:

     startdt = e_startdt + RANDOM_DAYS.

where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
startdt, "days").

So how would I go about doing this? I tried using the help files and all
but couldn't come up with something that worked. Is this the best way to
do this? Any other way that I can do this? Does it matter what kind of
seeding I use for the random number generator?

Thanks.

- Shahrukh


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 11:13:35 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
data

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 10:24:27 -0400
From:    Mark A Davenport MADAVENP <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
data

Hector,

When I worked at ACT, Inc we often treated the student's school identifier
this way, usually with great success.  Granted, we had many thousands of
cases to draw from.  Hunna only has 115?  She is going to run out of cases
pretty quickly, don't you think?


***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Hector Maletta <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/25/2007 10:15 AM
Please respond to
Hector Maletta <[hidden email]>


To
[hidden email]
cc

Subject
Re: stepwise regression how to include all cases despite missing    data






         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:36:49 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
         data

Hunna:

Even with a scale, the "missing" responses can be reinterpreted as saying
"This practice was not effective in this case because -for one reason or
another- it was not used". This is not quite clean conceptually, but is
your
only choice unless you put up with working with 10 cases.

The problem, apparently, is in the design of the questionnaire, asking for
the effectiveness of a practice that is not in universal use among the
cases
under analysis. In any case, a practice cannot have any effectiveness if
it
is not used, so I insist you can treat it as having zero effectiveness
when
it was not used.

On the other hand, since your variables seem to be many, and your cases
seem
to be few, perhaps you should consider a more artisanal approach for
identifying effective strategies instead of your alli-in-one regression.
With 87 predictors and 115 cases you don't have a chance even without a
single missing value.



Hector







  _____

From: Hunna Watson [mailto:[hidden email]]
Sent: 25 July 2007 11:36
To: Hector Maletta
Subject: FW: Re: stepwise regression how to include all cases despite
missing data



thanks for your reply, i've just come on board this project in the past
two
weeks, the data has been collected already and, this is essentially what
happened though i'm simplifying it, respondents rated how effective the
use
of the strategy was for preventing cost growth in the form of work that
had
to be done again on the project, so I have data on a scale and no
possibility for coding absent or present :S



extra information....



yes I know all the horrible things about stepwise, but it is the only
suitable method I can think of to answer the research questions, I have
just
come on board the project in the last two weeks. The research is very
exploratory and the topic hasn't been examined before. Data has been
collected on many different predictor variables (design-related sources,
subcontractor sources, site management sources, contract documentation,
the
list goes on and on - up to a terrible 87 predictors). There are 115
projects, so each is a case if you like, and we want to first look at this
data set (no options there), but after that we can merge it with another
data set containing information on a further 160 projects. Some predictors
weren't relevant to projects. for instance, some didn't use incentives,
but
we have ratings on scales of 1 to 5 (assessing raters perceptions of
contribution of use of that method to costs) and we are seeking to predict
costs from the predictor variables. IF a method wasn't applicable e.g.,
use
of a particular incentive plan, it has been left blank on questionnaires.
No
logical ordering.

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:48:56 -0400
From:    "William B. Ware" <[hidden email]>
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 10:31:24 -0600
From:    ViAnn Beadle <[hidden email]>
Subject: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the
binhex function in GPL. It groups together nearby points and then you use
the summary.count function to count the number of hits within the bin to
set
the size of the point displayed. This produces a plot sometimes referred
to
as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the
employee
data.sav sample file. The SCALE command constrains the minimum point size
to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel
points which look like dust on my monitor--so this gets around that. The
color.interior fills in the points (defaults to circles). I think the
default hollow points are ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
size(summary.count()), color.interior(color.blue))
END GPL.

The IGRAPH procedure also provides a jittering option which nudges the
points slightly apart by adding a small amount of random variation, but I
think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 18:39:32 +0200
From:    Georg Maubach <[hidden email]>
Subject: AW:      Re: Old Sunflower Option

Hi All,

We tried to run the syntax example on our "Employee data.sav" file. As we
have a German version the variable names were translated to German. Does
anybody know how we could obtain sample files in English?

Best regards

Georg Maubach
Research Manager


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von
ViAnn Beadle
Gesendet: Mittwoch, 25. Juli 2007 18:31
An: [hidden email]
Betreff: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the
binhex function in GPL. It groups together nearby points and then you use
the summary.count function to count the number of hits within the bin to
set the size of the point displayed. This produces a plot sometimes
referred to as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the
employee data.sav sample file. The SCALE command constrains the minimum
point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel
points which look like dust on my monitor--so this gets around that. The
color.interior fills in the points (defaults to circles). I think the
default hollow points are ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
size(summary.count()), color.interior(color.blue)) END GPL.

The IGRAPH procedure also provides a jittering option which nudges the
points slightly apart by adding a small amount of random variation, but I
think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 13:28:22 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Melissa,

I'm already doing something similar to what you said, using 1st, 10th
and 20th as the dates for start, middle and end of the month.  The Month
is a bit trickier since my exposures are events during pregnancy so I
have to be careful about just assigning a random month lest it falls
outside the pregnancy duration.

They question I had asked concerned dates which had month and year but
no day information - not even beginning, middle or end.  Therefore, I
couldn't even depend on the 1st-10th-20th coding.

My final solution for the problem where I had a start date and had a
stop month was to pick a random date between the start date (or the
first date of the stop month) and the last day of the stop month.  Gene
Maguin had emailed me earlier and suggested I use the UNIFORM function
to randomly select a date.  I don't think that message was posted on the
list-serv, so the body is copied below.  I've used the function since
and it works nicely.

Thanks again for your help though. I agree that there should be some
sort of algorithm in place at the interviewer level to minimize the
frequency of incomplete data.

- S. Hashmi

*copy of email from Gene*

> -----Original Message-----
> From: Gene Maguin [mailto:[hidden email]]
> Sent: Friday, July 20, 2007 8:05 AM
> To: Hashmi, Syed S
> Subject: RE: Random date generator
>
> Syed,
>
> I'd like to be helpful to you but I don't have time to make up a full
> solution. I think this would be a valid example of your question.
>
> Start date (mm/dd/yyyy): 5/x/2004
> Stop date (mm/dd/yyyy):  6/17/2004
> Possible duration range (6/17/2004)-(5/31/2004)=17 days to
> (6/17/2004)-(5/18/2004)=30 days (I assume a 30 day month)
>
> So x has to be between 18 and 31 inclusive.
>
> So I think the trick to the random draw is this command.
>
> Compute x=uniform(14).
> Compute x=trunc(x).
>
> Check this but I'm pretty sure that the range of x will be 0 to 13.
> Your actual date is then > Compute x=x+18.
>
> There's lots of big 'little bits' to tidy up but this will get you
want
> you > want when the tidying up has been done.
>
> Best wishes, Gene Maguin
>


> -----Original Message-----
> From: Melissa Ives [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 9:13 AM
> To: Hashmi, Syed S; [hidden email]
> Subject: RE: [SPSSX-L] Random date generator
>
> During the assessment process, interviewers are given these
instructions
> for estimating a date when a client cannot remember specific days or
> months. Perhaps you could create a similar algorithm?
>
> Date Guidelines (d/e):  Use the following rules if the participant is
> unsure of the exact date:
> DAY: Use the 5th for the beginning of the month, 15th for the middle
of
> the month, and 25th for the end of the month.
> MONTH: Use March for early in the year, July for middle of the year,
and
> October for later in the year, but try to make it so the number of
weeks
> is about right.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Syed Hashmi
> Sent: Thursday, July 19, 2007 9:48 PM
> To: [hidden email]
> Subject: [SPSSX-L] Random date generator
>
> Dear co-listers,
>
> A dataset that I'm analyzing has a set of dates for events (start and
> stop
> dates) as well as how long those events occured for.  The data for
each
> date is in three variables (month, day, year). The years are pretty
> complete if they are filled in but the month and day might are
sometimes

> listed as the exact month or date and other times they're listed as
> beginning, middle or end of the year (for the month variable) or the
> month (for the day variable).
>
> Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> duration) from which I can deduce the start and stop dates (startdt,
> stopdt).
> Unfortunately,  I have the complete start and stop date for about half
> the cases. The rest are missing either parts of one of the dates (eg.
> day) or for both.  If I have one of the dates and a duration, I can
> calculate the other date.
>
> The reason for this post is that there is a small subset of the
> population where I have the complete stop date but am missing the
start
> day (I have the year and month) and am also missing the duration.  I
had
> to come up with some way to impute a start date for these cases for
> analysis (which will be done with and without these specific cases).
I
> know that the event could not be more than a month long. Therefore,
what

> I was planning on doing was based on the information I have, calculate
> the earliest possible start date (e_startdt) up to a month before the
> stop date and then randomly pick a date between e_startdt and the stop
> date.
>
> Therefore, my query here was this: how can I code for this. I have an
> idea of how to do it in SAS but since I'm working in SPSS that doesn't
> help much.  I'm assuming that it will be something simple like:
>
>      startdt = e_startdt + RANDOM_DAYS.
>
> where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> startdt, "days").
>
> So how would I go about doing this? I tried using the help files and
all
> but couldn't come up with something that worked. Is this the best way
to

> do this? Any other way that I can do this? Does it matter what kind of
> seeding I use for the random number generator?
>
> Thanks.
>
> - Shahrukh
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 15:05:07 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Random date generator

Somehow I missed or deleted the original posting in this thread.
Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,

>A dataset that I'm analyzing has a set of dates for events (start and
>stop dates) as well as how long those events occured for.  The data
>for each date is in three variables (month, day, year). The years are
>pretty complete if they are filled in but the month and day might are
>sometimes listed as the exact month or date and other times they're
>listed as beginning, middle or end of the year (for the month
>variable) or the month (for the day variable).
>
>I have [two dates as three variables each, plus a duration] duration).
>I have the complete start and stop date for about half the cases. The
>rest are missing either parts of one of the dates (eg. day) or for
>both.  If I have one of the dates and a duration, I can calculate the
>other date.

So far, so good, though be careful about how precise your 'durations'
are.

>There is a small subset of the population where I have the complete
>stop date but am missing the start day (I have the year and month) and
>am also missing the duration.  I had to come up with some way to
>impute a start date for these cases for analysis. (which will be done
>with and without these specific cases).  I know that the event could
>not be more than a month long. I was planning calculate the earliest
>possible start date (e_startdt) up to a month before the stop date and
>then randomly pick a date between e_startdt and the stop date.

OUCH! I would not do this. Period.

*MAYBE* the start dates and durations you get this way will be vaguely
representative of the population of events, though I doubt it. Are your
durations roughly uniformly distributed from 0 to 30 days? For goodness
sake, you ought to check that before proceeding.

But even if they're representative of the population, they have nothing
to do with the individual cases for which they're 'imputed'. No
analysis using those 'dates' will be the least trustworthy.

A far better approach is to use true missing-value interpolation on the
*durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
durations you'd have to impute. If it's near 50%, that won't be at all
reliable, either.

-Good luck,
  Richard

------------------------------

Date:    Wed, 25 Jul 2007 15:09:53 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 2:05 PM
>
> >There is a small subset of the population where I have the complete
> >stop date but am missing the start day (I have the year and month)
and
> >am also missing the duration.  I had to come up with some way to
> >impute a start date for these cases for analysis. (which will be done
> >with and without these specific cases).  I know that the event could
> >not be more than a month long. I was planning calculate the earliest
> >possible start date (e_startdt) up to a month before the stop date
and
> >then randomly pick a date between e_startdt and the stop date.
>
> OUCH! I would not do this. Period.
>
> *MAYBE* the start dates and durations you get this way will be vaguely
> representative of the population of events, though I doubt it. Are
your
> durations roughly uniformly distributed from 0 to 30 days? For
goodness
> sake, you ought to check that before proceeding.
>
> But even if they're representative of the population, they have
nothing
> to do with the individual cases for which they're 'imputed'. No
> analysis using those 'dates' will be the least trustworthy.
>
> A far better approach is to use true missing-value interpolation on
the
> *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> durations you'd have to impute. If it's near 50%, that won't be at all
> reliable, either.
>
> -Good luck,
>   Richard


Richard,

Thanks for your input.  I realize that I was stepping into extremely
treacherous territory when I decide to impute dates and select random
ones.  As for the durations being roughly uniformly distributed, that's
what it looks like from the data I do have.  Initially, I'd assumed that
durations would have a mean of about 7 days but somehow the data I do
have doesn't seem to show that.  It's more or less uniformly
distributed.  There were some durations that were >30 days but I doubt
if they're true.  Therefore, I decided to go ahead with the uniform
distribution (although, the whole imputation and random selection still
bothers me).

The reason that I'm trying to get an idea about the dates, especially
the event start dates, is due to the nature of the study question. I'm
looking at the occurrence of certain events during pregnancy.  However,
these events of interest have to occur within the first trimester, or if
I narrow it down further, the first two months of pregnancy.  Therefore,
I have to know if an event occurred within a certain period of time
after the last menstrual date as reported by the woman.  At the end of
the day, the variables for all the events get filtered down to a single
dichotomous variable - Y/N did the event occur during the period of
interest?

I will do the analysis with and without the cases where the dates have
been imputed from incomplete data.  I hadn't previously thought of using
true-missing value interpolation on the durations but I'll look into it.
I've never done that before so will have to read up a bit on it.  I
might have an issue with number of missings though, since more cases
have at least some part of the date then a duration value.

Thanks again for your advice. It's always nice to get a fresh look at an
issue.

- Shahurkh

------------------------------

Date:    Wed, 25 Jul 2007 16:24:09 -0400
From:    Gene Maguin <[hidden email]>
Subject: Re: Random date generator

Syed,

It sounds like you are going to use the imputed dates to decide if
something
happended or not. The new variable, 'something happened or not' might be a
dependent variable or it might be an independent variable. There's a
literature on estimating relationships in the presence of missing data. To
correctly estimate relationships (or, at least, come very close), you
should
use either multiple imputation or a maximum likelihood estimation method
that incorporates the EM algorithm. So far as I know, SPSS has neither.
The
key person here is Donald Rubin. But, there are other, more recent
articles.

Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 20:43:06 +0000
From:    Hamish Travers <[hidden email]>
Subject: Can someone please tell me how to unsubscribe from this forum,
         thanks in advance

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live
Space=
s. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=3Dcreate&wx_url=3D/friends.=

aspx&mkt=3Den-us=

------------------------------

Date:    Wed, 25 Jul 2007 16:41:18 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Gene,

After the comments that you and Richard made I'm thinking real hard of
rethinking the whole thing.  Maximum likelihood estimation was something
that I had thought of initially but didn't follow up on.  I guess it's
time that I do.  Thanks again for your help.

- Shahrukh


> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Gene Maguin
> Sent: Wednesday, July 25, 2007 3:24 PM
> To: [hidden email]
> Subject: Re: Random date generator
>
> Syed,
>
> It sounds like you are going to use the imputed dates to decide if
> something
> happended or not. The new variable, 'something happened or not' might
be a
> dependent variable or it might be an independent variable. There's a
> literature on estimating relationships in the presence of missing
data. To
> correctly estimate relationships (or, at least, come very close), you
> should
> use either multiple imputation or a maximum likelihood estimation
method
> that incorporates the EM algorithm. So far as I know, SPSS has
neither.
> The
> key person here is Donald Rubin. But, there are other, more recent
> articles.
>
> Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 18:33:35 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Can someone please tell me how to unsubscribe

FAQ: How to unsubscribe, or leave list SPSSX-L:

Requests to unsubscribe that are posted to the list, will never be
acted on.

You must send the request to [hidden email].

 From the E-mail address from which you're subscribed to the list, send
a message to [hidden email] with the following words in the
body of the message:

SIGNOFF SPSSX-L

Don't put anything else (your name, etc.) in the body of the message.

It should work. If it doesn't, go to the following Web page:

http://www.listserv.uga.edu/cgi-bin/wa?SUBED1=spssx-l&A=1

and unsubscribe from there.

...........................
More information:

When you subscribed to the list, you received a welcome message (I'm
copying it below) with instructions (including asking you to save it).

 From the welcome message:

>Your  subscription to  the SPSSX-L  list (SPSSX(r)  Discussion)
>has  been accepted.
>
>Please save this message for future  reference, [...]
>
>You may leave the list at any time by sending a "SIGNOFF SPSSX-L"
>command to [hidden email].

There are many other commands that can be sent to the same address, to
manage your subscription. If you send mail to [hidden email]
with the text

INFO REFCARD

and no other text, you will be mailed a file describing those commands.

------------------------------

Date:    Wed, 25 Jul 2007 16:00:22 -0700
From:    Karen Powers <[hidden email]>
Subject: RENAME LOOP?

Hello SPSS list,
I have a dataset in which the variable names are var002, var003   ...
var3477.

I would like to RENAME each variable with the number listed in row 1 of
each respective column.
The first number in var002 is 6951030.  I would like var002 renamed
"rs6951030".
The command for doing this once that runs nicely is:

RENAME VARS var002 = "rs"+ "6951030".
EXE.

Now I would like to do this for all vars through var3477.
I have tried DO REPEAT and LOOP commands but they both say
 >Warning # 141.  Command name: RENAME VARS
 >DO REPEAT has no effect on this command.

Any ideas on how I can do this (3477 times)?

Thanks, Karen

------------------------------

Date:    Wed, 25 Jul 2007 19:27:19 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Aggregating with missing data

At 03:30 AM 7/14/2007, Marco wrote:

>When using the aggregate (mean) function in SPSS, cells that contain
>missing data become empty. Thus, a cell that should contain the mean
>of multiple cells (one of which is empty/missing), turns to zero
>because it contains one missing datum.

I'm not sure what's happening to you, but you shouldn't be seeing what
you say you're seeing.

 From your description, it sounds like you're doing one of two things:
a) Using the MEAN function with command AGGREGATE to average over a set
of variables
b) Using the MEAN function in the transformation language to average
over a set of variables.

BOTH of those, however, ignore missing values when averaging, and take
the mean of the non-missing values; they don't make a value 0 because
there's a missing value in the list. (That would be a very dangerous
thing to do anyway.) So I'm not sure what's happening.

Could you post the syntax, some test data, what output you get, and
tell us what output you want?

.....................................
Here are demonstrations of averaging across cases with AGGREGATE, and
averaging across variables. It's SPSS 15 draft output (WRR-not saved
separately).
.....................................
Using AGGREGATE to average over cases:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
[Aggregate]

Group Value

     1     1
     1     2
     1     3
     2     4
     2     5
     2     .
     2     7
     3     .
     3     9
     3    10


Number of cases read:  10    Number of cases listed:  10


AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
Group Members     MEAN

     1       3     2.00
     2       4     5.33
     3       3     9.50

Number of cases read:  3    Number of cases listed:  3
.....................................
Using MEAN to average over variables:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:04       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4

     1      3       1       2       3       .
     2      4       4       5       .       7
     3      3       .       9      10       .


Number of cases read:  3    Number of cases listed:  3


NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:26       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4   Mean

     1      3       1       2       3       .    2.00
     2      4       4       5       .       7    5.33
     3      3       .       9      10       .    9.50

Number of cases read:  3    Number of cases listed:  3

===================
APPENDIX:  All code
===================
I keyed the test data into the Data Editor; however, it can be
recovered from the LIST output fairly easily. Here's all the code:

DATASET ACTIVATE TestData.
DATASET COPY     Aggregate.
DATASET ACTIVATE Aggregate WINDOW=FRONT.

LIST.

AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

DATASET ACTIVATE TestData.
DATASET COPY     Wide.
DATASET ACTIVATE Wide      WINDOW=FRONT.

SORT CASES BY Group .
CASESTOVARS
  /ID = Group
  /GROUPBY = VARIABLE
  /COUNT = Members "Size of group" .

LIST.

NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).

LIST.

------------------------------

Date:    Wed, 25 Jul 2007 20:24:54 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Matching files on one of three possible ID's

At 09:29 AM 7/12/2007, Daniel Robertson wrote:

>I have two database extracts that I am trying to merge, one of which
>contains enrolled students and the other contains approximately the
>same group of students when they were applicants. In the Enrollment
>file students are uniquely identified by 'enroll_id'. In the Applicant
>file there is a primary ID, 'applicant_id1', but there may be up to
>two other IDs which were issued and updated provisionally as the
>student was going through the application process. The rub is that
>'enroll_id' may match any one of the applicant IDs, not necessarily
>the primary one.

Gene's given you a workable solution. It requires sorting the data
three times;  but with three keys, something like that is inevitable.

You *can* combine the three sorting operations into one step, by using
XSAVE to create three copies of each Applicant record, one each in
which 'enroll_id' is loaded from each of the three candidate key
variables in the Applicant file. Then sort the resulting file by that
'enroll_id', MATCH FILES with the Enrollment file, and discard any
Applicant records that don't match.

Now, that's the simplest possible case. You may need logic in case,
say, the same ID value occurs in more than one of the Applicant-record
fields. But it's another way to go.

Sorry; no code this time.

------------------------------

End of SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
**************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Melissa Ives
In reply to this post by Chelminski, Iwona
Hmm, I would think about looking into Version 16 which is about to come out in a few months and promises to be less 'quirky' than 15.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chelminski, Iwona
Sent: Thursday, July 26, 2007 10:52 AM
To: [hidden email]
Subject: Re: [SPSSX-L] SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Hi Group,
I'm thinking about buying the 15 version. Is it any good? They want over $1,500 for it so i want to make sure that it's not a piece of crap like the version 10 was.
Any comments?
Thanks in advance

Iwona


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Automatic digest processor
Sent: Thursday, July 26, 2007 12:01 AM
To: Recipients of SPSSX-L digests
Subject: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)


There are 24 messages totalling 1629 lines in this issue.

Topics of the day:

  1. SPSS 15 loses reg. with Vista
  2. ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
  3. data dictionary (2)
  4. include (2)
  5. stepwise regression how to include all cases despite missing data (3)
  6. Random date generator (6)
  7. stepwise regression how to include all cases despite missing
     data
  8. Old Sunflower Option (2)
  9. AW:      Re: Old Sunflower Option
 10. Can someone please tell me how to unsubscribe from this forum, thanks in
     advance
 11. Can someone please tell me how to unsubscribe
 12. RENAME LOOP?
 13. Aggregating with missing data
 14. Matching files on one of three possible ID's

----------------------------------------------------------------------

Date:    Wed, 25 Jul 2007 04:36:06 -0400
From:    Nico Munting <[hidden email]>
Subject: Re: SPSS 15 loses reg. with Vista

I hope you have solved this problem by now, but if you haven't you might
try the following.

I do not have experience with SPSS on Vista, but my guess is that SPSS is
trying to store the registration information in C:\Program Files\ or the
equivalent in your situation. However, because in Vista you are not running
all programs as Administrator, SPSS does not have write-access to the
Program Files directory.

To give SPSS write-access to the Program Files directory right click on the
shortcut you are using to launch SPSS and select "Run As...". Choose an
account with Administrative privileges and you might be prompted for a
password or a confirmation. After this SPSS should start normally. Now
complete the registration, and since SPSS was started in Administrative
mode, it should be able to write the registration information to the
Program Files directory.

If all is well you should be able to start SPSS normally from this point
on, and hopefully it remembers your registration.

Good luck,

Nico


On Wed, 4 Jul 2007 06:33:47 +0100, David Hitchin <[hidden email]>
wrote:

>I have installed SPSS 15.0, including the patch to take it to 15.0.1 and
>the additional patch to cope with Vista problems. The new licence
>procedure works, and SPSS functions as expected.
>
>At some point, either when SPSS is stopped or the machine is turned off,
>the registration is lost; next time SPSS is started it has to be
>re-registered.
>
>Any ideas?
>
>David Hitchin

------------------------------

Date:    Wed, 25 Jul 2007 10:14:57 +0100
From:    Peter Watson <[hidden email]>
Subject: ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1804928587-1185354897=:4092
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


A reminder of the ASSESS meeting in November:

ASSESS: SPSS USERS' GROUP
21st ANNUAL MEETING
FRIDAY 9th NOVEMBER 2007
ALCUIN RESEARCH RESOURCE CENTRE AUDITORIUM=20
UNIVERSITY OF YORK, YORK


ASSESS is an independent user group for SPSS, a computer package for analys=
ing=20
and presenting data. It is run by users, for users and is completely indepe=
ndent
  of manufacturers of the software. The meeting is open to all users of SPS=
S and=20
to anyone interested in SPSS.

Come along to:
* hear SPSS users talk about applications,the problems and solutions
* hear the latest news from SPSS UK staff about product developments,
   and put your questions to them
* question a panel of experts about particular problems
* exchange ideas with other SPSS users
* plan for an even better user group.

The venue is the Alcuin Research Resource Centre (ARRC) on York University=
=20
campus located in Heslington, 2 miles to the south-east of the city centre.=
=20
It takes 10-15 minutes in a taxi from the railway station. The Number 4 bus=
=20
runs regularly to the University from York railway station (see=20
http://www.yorkshiretravel.net/). Parking at the University is very difficu=
lt.=20
Location details are at http://www.york.ac.uk/np/maps/. Accommodation is=20
bookable via tourist information on (01904) 621756 or (01904) 554455.

THE PROVISIONAL PROGRAMME**

* Welcome and introduction to meeting

* SPSS company and product news; SPSS software demonstrations

* How and why to document data for long-term storage, and what's special=20
about GI (geographical) data? by Allan Reese, CEFAS

* Making the world a better place with SPSS: analysing & predicting
charity donor behaviour using SPSS Base
by John Sauve-Rodd, Datapreneurs

* Applications of OMS by Gilbert MacKenzie, University of Limerick

* Multivariate aspects of testing the savannah hypothesis of shopping=20
by Charles Dennis, Brunel University

*  Mousing with SPSS: useful point and click=20
by Frances Provan, University of Edinburgh


* Users" Question Time and Clinic

* Annual General Meeting of ASSESS.

Registration and coffee will start at 10am. Papers and other events will ru=
n=20
from 10.30am to about 5.00pm. Morning coffee, lunch and afternoon tea are=
=20
included in the registration fee. A timetable will be e-mailed to delegates
  in advance of the meeting.




______________________
** The titles and order of events are subject to amendment.
---------------------------------------------------------------------------=
-----

                              BOOKING FORM
                        ASSESS : SPSS USERS' GROUP
         Friday 9th November 2007, ARRC auditorium, University of York

Important:

Bookings will not be treated as firm until a cheque or official (company) o=
rder,
payable to ASSESS, is received. Payment possible by BACS. Details on reques=
t.


Name:  ______________________________ Tel: ____________________
Email: ______________________________ Fax: ____________________

Job Title:    ___________________________________
Organization: ___________________________________
Address:      ___________________________________________________
               ___________________________________________________
               _______________________   Postcode  _______________

Strike out the sections which do not apply to you, or otherwise amend as
appropriate:

INDIVIDUAL BOOKING. Please reserve a place for me, at a cost of 65 GBP.

CORPORATE BOOKING. (Enter the appropriate amounts)

Please reserve ______ places, at a cost of =A3_____ (65 GBP, for the first =
person,=20
and 55 GBP for each subsequent person) .

Names of attendees : 1. _______________________________________
(for badges)         2. _______________________________________
                      3. _______________________________________
                      4. _______________________________________
                      5. _______________________________________

STUDENT (POST-GRADUATE) BOOKING. (Enclose photocopied evidence of status
for 2007-2008 academic year). Please reserve for me one of the student
places at a cost of 40 GBP.

Specify vegetarian or other dietary requirements, if any:
________________________________________________________________

Cheque or official order enclosed for _______GBP

For official orders please also give here the number and address for invoic=
ing:
________________________________________________________________
________________________________________________________________
________________________________________________________________


(Please indicate if you require a receipt of payment)

Return completed forms to: Peter Watson, MRC Cognition and Brain
Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF.

Telephone enquiries about bookings: 01223 355294 x801 (has an answerphone)

E-mail enquiries about bookings: =[hidden email]
(important: put "ASSESS" in the Subject field)

---559023410-1804928587-1185354897=:4092--

------------------------------

Date:    Wed, 25 Jul 2007 03:41:21 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: data dictionary

DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.


--- "Parry, James" <[hidden email]> wrote:

> Hi Ken,
>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
> -HTH
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 12:48:59 +0200
From:    Maddalena Agonigi <[hidden email]>
Subject: include

I am new to SPSS programming and so any help will be deeply appreciated.

I use INSERT in a Script

INSERT FILE=3D'C:\Test.sps'
  SYNTAX=3DBATCH Error=3DStop CD=3DYES.

but  spss give me following message

INSERT <qu=EC>FILE=3D'C:\Test.sps
(0) Istruzione non valida.

Thank you all

------------------------------

Date:    Wed, 25 Jul 2007 05:55:45 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: include

Hi,

--What SPSS version are you using? INSERT has been
implemented relatively recently (spss v13+ (?)). Did
you try using INCLUDE already?
--Are you absolutely sure the sps file is where you
say it is?

Albert-Jan

--- Maddalena Agonigi <[hidden email]> wrote:

> I am new to SPSS programming and so any help will be
> deeply appreciated.
>
> I use INSERT in a Script
>
> INSERT FILE='C:\Test.sps'
>   SYNTAX=BATCH Error=Stop CD=YES.
>
> but  spss give me following message
>
> INSERT <quì>FILE='C:\Test.sps
> (0) Istruzione non valida.
>
> Thank you all
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 09:49:25 -0400
From:    Ken Wood <[hidden email]>
Subject: Re: data dictionary

Thank you for the many suggestions.  For those interested, the suggestions I received are below.






you can go to File Menu and choose the option display information
about the current working file or an external file. You will have an output
that you can save in .spo format or even copy to word, excel or any other
software.




Do Display Dictionary, then go to the Output, select the information
displayed, and either print from there, or export to either RTF or
Excel formats. Recent versions of SPSS provide information on
variable names and structure separately from Value labels, which is
a bit annoying.



try this command in syntax:

display dictionary.




DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.




*OMS
*  /SELECT TABLES
*  /EXCEPTIF LABELS =['Notes']
*  /DESTINATION
*        Format = Text
*        OUTFILE = "dictionary.txt".
*DISPLAY DICTIONARY.
*OMSEND.



>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Ken Wood, PhD
Research Scientist
KMRREC
West Orange, NJ
973-243-6871

------------------------------

Date:    Wed, 25 Jul 2007 21:58:49 +0800
From:    Hunna Watson <[hidden email]>
Subject: stepwise regression how to include all cases despite missing data

Hi all,=20
=20
I'm running a stepwise regression of organizational practices on =
construction projects that predict project cost growth. I have data for =
115 projects, yet some organizational practices were not applicable on =
some projects (in a random fashion). the missing data is obviously =
purposeful and not due to not filling in questionnaires etc. spss =
automatically excludes cases with any missing values, or wants to =
substitute a value, so I end up with a regression being carried out on =
10 projects, obviously not useful. Any suggestions for syntax to include =
all cases or suggestions to rectify this problem?
=20
Thanks in advance,=20
Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 09:12:37 -0500
From:    Melissa Ives <[hidden email]>
Subject: Re: Random date generator

During the assessment process, interviewers are given these instructions
for estimating a date when a client cannot remember specific days or
months. Perhaps you could create a similar algorithm?

Date Guidelines (d/e):  Use the following rules if the participant is
unsure of the exact date:
DAY: Use the 5th for the beginning of the month, 15th for the middle of
the month, and 25th for the end of the month.
MONTH: Use March for early in the year, July for middle of the year, and
October for later in the year, but try to make it so the number of weeks
is about right.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Syed Hashmi
Sent: Thursday, July 19, 2007 9:48 PM
To: [hidden email]
Subject: [SPSSX-L] Random date generator

Dear co-listers,

A dataset that I'm analyzing has a set of dates for events (start and
stop
dates) as well as how long those events occured for.  The data for each
date is in three variables (month, day, year). The years are pretty
complete if they are filled in but the month and day might are sometimes
listed as the exact month or date and other times they're listed as
beginning, middle or end of the year (for the month variable) or the
month (for the day variable).

Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
duration) from which I can deduce the start and stop dates (startdt,
stopdt).
Unfortunately,  I have the complete start and stop date for about half
the cases. The rest are missing either parts of one of the dates (eg.
day) or for both.  If I have one of the dates and a duration, I can
calculate the other date.

The reason for this post is that there is a small subset of the
population where I have the complete stop date but am missing the start
day (I have the year and month) and am also missing the duration.  I had
to come up with some way to impute a start date for these cases for
analysis (which will be done with and without these specific cases).  I
know that the event could not be more than a month long. Therefore, what
I was planning on doing was based on the information I have, calculate
the earliest possible start date (e_startdt) up to a month before the
stop date and then randomly pick a date between e_startdt and the stop
date.

Therefore, my query here was this: how can I code for this. I have an
idea of how to do it in SAS but since I'm working in SPSS that doesn't
help much.  I'm assuming that it will be something simple like:

     startdt = e_startdt + RANDOM_DAYS.

where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
startdt, "days").

So how would I go about doing this? I tried using the help files and all
but couldn't come up with something that worked. Is this the best way to
do this? Any other way that I can do this? Does it matter what kind of
seeding I use for the random number generator?

Thanks.

- Shahrukh


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 11:13:35 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 10:24:27 -0400
From:    Mark A Davenport MADAVENP <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

Hector,

When I worked at ACT, Inc we often treated the student's school identifier
this way, usually with great success.  Granted, we had many thousands of
cases to draw from.  Hunna only has 115?  She is going to run out of cases
pretty quickly, don't you think?


***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Hector Maletta <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/25/2007 10:15 AM
Please respond to
Hector Maletta <[hidden email]>


To
[hidden email]
cc

Subject
Re: stepwise regression how to include all cases despite missing    data






         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:36:49 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
         data

Hunna:

Even with a scale, the "missing" responses can be reinterpreted as saying
"This practice was not effective in this case because -for one reason or
another- it was not used". This is not quite clean conceptually, but is your
only choice unless you put up with working with 10 cases.

The problem, apparently, is in the design of the questionnaire, asking for
the effectiveness of a practice that is not in universal use among the cases
under analysis. In any case, a practice cannot have any effectiveness if it
is not used, so I insist you can treat it as having zero effectiveness when
it was not used.

On the other hand, since your variables seem to be many, and your cases seem
to be few, perhaps you should consider a more artisanal approach for
identifying effective strategies instead of your alli-in-one regression.
With 87 predictors and 115 cases you don't have a chance even without a
single missing value.



Hector







  _____

From: Hunna Watson [mailto:[hidden email]]
Sent: 25 July 2007 11:36
To: Hector Maletta
Subject: FW: Re: stepwise regression how to include all cases despite
missing data



thanks for your reply, i've just come on board this project in the past two
weeks, the data has been collected already and, this is essentially what
happened though i'm simplifying it, respondents rated how effective the use
of the strategy was for preventing cost growth in the form of work that had
to be done again on the project, so I have data on a scale and no
possibility for coding absent or present :S



extra information....



yes I know all the horrible things about stepwise, but it is the only
suitable method I can think of to answer the research questions, I have just
come on board the project in the last two weeks. The research is very
exploratory and the topic hasn't been examined before. Data has been
collected on many different predictor variables (design-related sources,
subcontractor sources, site management sources, contract documentation, the
list goes on and on - up to a terrible 87 predictors). There are 115
projects, so each is a case if you like, and we want to first look at this
data set (no options there), but after that we can merge it with another
data set containing information on a further 160 projects. Some predictors
weren't relevant to projects. for instance, some didn't use incentives, but
we have ratings on scales of 1 to 5 (assessing raters perceptions of
contribution of use of that method to costs) and we are seeking to predict
costs from the predictor variables. IF a method wasn't applicable e.g., use
of a particular incentive plan, it has been left blank on questionnaires. No
logical ordering.

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:48:56 -0400
From:    "William B. Ware" <[hidden email]>
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 10:31:24 -0600
From:    ViAnn Beadle <[hidden email]>
Subject: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the
binhex function in GPL. It groups together nearby points and then you use
the summary.count function to count the number of hits within the bin to set
the size of the point displayed. This produces a plot sometimes referred to
as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee
data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel
points which look like dust on my monitor--so this gets around that. The
color.interior fills in the points (defaults to circles). I think the
default hollow points are ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
size(summary.count()), color.interior(color.blue))
END GPL.

The IGRAPH procedure also provides a jittering option which nudges the
points slightly apart by adding a small amount of random variation, but I
think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 18:39:32 +0200
From:    Georg Maubach <[hidden email]>
Subject: AW:      Re: Old Sunflower Option

Hi All,

We tried to run the syntax example on our "Employee data.sav" file. As we have a German version the variable names were translated to German. Does anybody know how we could obtain sample files in English?

Best regards

Georg Maubach
Research Manager


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von ViAnn Beadle
Gesendet: Mittwoch, 25. Juli 2007 18:31
An: [hidden email]
Betreff: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the binhex function in GPL. It groups together nearby points and then you use the summary.count function to count the number of hits within the bin to set the size of the point displayed. This produces a plot sometimes referred to as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel points which look like dust on my monitor--so this gets around that. The color.interior fills in the points (defaults to circles). I think the default hollow points arrre ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))), size(summary.count()), color.interior(color.blue)) END GPL.

The IGRAPH procedure also provides a jittering option which nudges the points slightly apart by adding a small amount of random variation, but I think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower option to show multiple cases at the same point in a scatter plot.  Does anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 13:28:22 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Melissa,

I'm already doing something similar to what you said, using 1st, 10th
and 20th as the dates for start, middle and end of the month.  The Month
is a bit trickier since my exposures are events during pregnancy so I
have to be careful about just assigning a random month lest it falls
outside the pregnancy duration.

They question I had asked concerned dates which had month and year but
no day information - not even beginning, middle or end.  Therefore, I
couldn't even depend on the 1st-10th-20th coding.

My final solution for the problem where I had a start date and had a
stop month was to pick a random date between the start date (or the
first date of the stop month) and the last day of the stop month.  Gene
Maguin had emailed me earlier and suggested I use the UNIFORM function
to randomly select a date.  I don't think that message was posted on the
list-serv, so the body is copied below.  I've used the function since
and it works nicely.

Thanks again for your help though. I agree that there should be some
sort of algorithm in place at the interviewer level to minimize the
frequency of incomplete data.

- S. Hashmi

*copy of email from Gene*

> -----Original Message-----
> From: Gene Maguin [mailto:[hidden email]]
> Sent: Friday, July 20, 2007 8:05 AM
> To: Hashmi, Syed S
> Subject: RE: Random date generator
>
> Syed,
>
> I'd like to be helpful to you but I don't have time to make up a full
> solution. I think this would be a valid example of your question.
>
> Start date (mm/dd/yyyy): 5/x/2004
> Stop date (mm/dd/yyyy):  6/17/2004
> Possible duration range (6/17/2004)-(5/31/2004)=17 days to
> (6/17/2004)-(5/18/2004)=30 days (I assume a 30 day month)
>
> So x has to be between 18 and 31 inclusive.
>
> So I think the trick to the random draw is this command.
>
> Compute x=uniform(14).
> Compute x=trunc(x).
>
> Check this but I'm pretty sure that the range of x will be 0 to 13.
> Your actual date is then > Compute x=x+18.
>
> There's lots of big 'little bits' to tidy up but this will get you
want
> you > want when the tidying up has been done.
>
> Best wishes, Gene Maguin
>


> -----Original Message-----
> From: Melissa Ives [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 9:13 AM
> To: Hashmi, Syed S; [hidden email]
> Subject: RE: [SPSSX-L] Random date generator
>
> During the assessment process, interviewers are given these
instructions
> for estimating a date when a client cannot remember specific days or
> months. Perhaps you could create a similar algorithm?
>
> Date Guidelines (d/e):  Use the following rules if the participant is
> unsure of the exact date:
> DAY: Use the 5th for the beginning of the month, 15th for the middle
of
> the month, and 25th for the end of the month.
> MONTH: Use March for early in the year, July for middle of the year,
and
> October for later in the year, but try to make it so the number of
weeks
> is about right.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Syed Hashmi
> Sent: Thursday, July 19, 2007 9:48 PM
> To: [hidden email]
> Subject: [SPSSX-L] Random date generator
>
> Dear co-listers,
>
> A dataset that I'm analyzing has a set of dates for events (start and
> stop
> dates) as well as how long those events occured for.  The data for
each
> date is in three variables (month, day, year). The years are pretty
> complete if they are filled in but the month and day might are
sometimes

> listed as the exact month or date and other times they're listed as
> beginning, middle or end of the year (for the month variable) or the
> month (for the day variable).
>
> Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> duration) from which I can deduce the start and stop dates (startdt,
> stopdt).
> Unfortunately,  I have the complete start and stop date for about half
> the cases. The rest are missing either parts of one of the dates (eg.
> day) or for both.  If I have one of the dates and a duration, I can
> calculate the other date.
>
> The reason for this post is that there is a small subset of the
> population where I have the complete stop date but am missing the
start
> day (I have the year and month) and am also missing the duration.  I
had
> to come up with some way to impute a start date for these cases for
> analysis (which will be done with and without these specific cases).
I
> know that the event could not be more than a month long. Therefore,
what

> I was planning on doing was based on the information I have, calculate
> the earliest possible start date (e_startdt) up to a month before the
> stop date and then randomly pick a date between e_startdt and the stop
> date.
>
> Therefore, my query here was this: how can I code for this. I have an
> idea of how to do it in SAS but since I'm working in SPSS that doesn't
> help much.  I'm assuming that it will be something simple like:
>
>      startdt = e_startdt + RANDOM_DAYS.
>
> where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> startdt, "days").
>
> So how would I go about doing this? I tried using the help files and
all
> but couldn't come up with something that worked. Is this the best way
to

> do this? Any other way that I can do this? Does it matter what kind of
> seeding I use for the random number generator?
>
> Thanks.
>
> - Shahrukh
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 15:05:07 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Random date generator

Somehow I missed or deleted the original posting in this thread.
Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,

>A dataset that I'm analyzing has a set of dates for events (start and
>stop dates) as well as how long those events occured for.  The data
>for each date is in three variables (month, day, year). The years are
>pretty complete if they are filled in but the month and day might are
>sometimes listed as the exact month or date and other times they're
>listed as beginning, middle or end of the year (for the month
>variable) or the month (for the day variable).
>
>I have [two dates as three variables each, plus a duration] duration).
>I have the complete start and stop date for about half the cases. The
>rest are missing either parts of one of the dates (eg. day) or for
>both.  If I have one of the dates and a duration, I can calculate the
>other date.

So far, so good, though be careful about how precise your 'durations'
are.

>There is a small subset of the population where I have the complete
>stop date but am missing the start day (I have the year and month) and
>am also missing the duration.  I had to come up with some way to
>impute a start date for these cases for analysis. (which will be done
>with and without these specific cases).  I know that the event could
>not be more than a month long. I was planning calculate the earliest
>possible start date (e_startdt) up to a month before the stop date and
>then randomly pick a date between e_startdt and the stop date.

OUCH! I would not do this. Period.

*MAYBE* the start dates and durations you get this way will be vaguely
representative of the population of events, though I doubt it. Are your
durations roughly uniformly distributed from 0 to 30 days? For goodness
sake, you ought to check that before proceeding.

But even if they're representative of the population, they have nothing
to do with the individual cases for which they're 'imputed'. No
analysis using those 'dates' will be the least trustworthy.

A far better approach is to use true missing-value interpolation on the
*durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
durations you'd have to impute. If it's near 50%, that won't be at all
reliable, either.

-Good luck,
  Richard

------------------------------

Date:    Wed, 25 Jul 2007 15:09:53 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 2:05 PM
>
> >There is a small subset of the population where I have the complete
> >stop date but am missing the start day (I have the year and month)
and
> >am also missing the duration.  I had to come up with some way to
> >impute a start date for these cases for analysis. (which will be done
> >with and without these specific cases).  I know that the event could
> >not be more than a month long. I was planning calculate the earliest
> >possible start date (e_startdt) up to a month before the stop date
and
> >then randomly pick a date between e_startdt and the stop date.
>
> OUCH! I would not do this. Period.
>
> *MAYBE* the start dates and durations you get this way will be vaguely
> representative of the population of events, though I doubt it. Are
your
> durations roughly uniformly distributed from 0 to 30 days? For
goodness
> sake, you ought to check that before proceeding.
>
> But even if they're representative of the population, they have
nothing
> to do with the individual cases for which they're 'imputed'. No
> analysis using those 'dates' will be the least trustworthy.
>
> A far better approach is to use true missing-value interpolation on
the
> *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> durations you'd have to impute. If it's near 50%, that won't be at all
> reliable, either.
>
> -Good luck,
>   Richard


Richard,

Thanks for your input.  I realize that I was stepping into extremely
treacherous territory when I decide to impute dates and select random
ones.  As for the durations being roughly uniformly distributed, that's
what it looks like from the data I do have.  Initially, I'd assumed that
durations would have a mean of about 7 days but somehow the data I do
have doesn't seem to show that.  It's more or less uniformly
distributed.  There were some durations that were >30 days but I doubt
if they're true.  Therefore, I decided to go ahead with the uniform
distribution (although, the whole imputation and random selection still
bothers me).

The reason that I'm trying to get an idea about the dates, especially
the event start dates, is due to the nature of the study question. I'm
looking at the occurrence of certain events during pregnancy.  However,
these events of interest have to occur within the first trimester, or if
I narrow it down further, the first two months of pregnancy.  Therefore,
I have to know if an event occurred within a certain period of time
after the last menstrual date as reported by the woman.  At the end of
the day, the variables for all the events get filtered down to a single
dichotomous variable - Y/N did the event occur during the period of
interest?

I will do the analysis with and without the cases where the dates have
been imputed from incomplete data.  I hadn't previously thought of using
true-missing value interpolation on the durations but I'll look into it.
I've never done that before so will have to read up a bit on it.  I
might have an issue with number of missings though, since more cases
have at least some part of the date then a duration value.

Thanks again for your advice. It's always nice to get a fresh look at an
issue.

- Shahurkh

------------------------------

Date:    Wed, 25 Jul 2007 16:24:09 -0400
From:    Gene Maguin <[hidden email]>
Subject: Re: Random date generator

Syed,

It sounds like you are going to use the imputed dates to decide if something
happended or not. The new variable, 'something happened or not' might be a
dependent variable or it might be an independent variable. There's a
literature on estimating relationships in the presence of missing data. To
correctly estimate relationships (or, at least, come very close), you should
use either multiple imputation or a maximum likelihood estimation method
that incorporates the EM algorithm. So far as I know, SPSS has neither. The
key person here is Donald Rubin. But, there are other, more recent articles.

Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 20:43:06 +0000
From:    Hamish Travers <[hidden email]>
Subject: Can someone please tell me how to unsubscribe from this forum,
         thanks in advance

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Space=
s. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=3Dcreate&wx_url=3D/friends.=
aspx&mkt=3Den-us=

------------------------------

Date:    Wed, 25 Jul 2007 16:41:18 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Gene,

After the comments that you and Richard made I'm thinking real hard of
rethinking the whole thing.  Maximum likelihood estimation was something
that I had thought of initially but didn't follow up on.  I guess it's
time that I do.  Thanks again for your help.

- Shahrukh


> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Gene Maguin
> Sent: Wednesday, July 25, 2007 3:24 PM
> To: [hidden email]
> Subject: Re: Random date generator
>
> Syed,
>
> It sounds like you are going to use the imputed dates to decide if
> something
> happended or not. The new variable, 'something happened or not' might
be a
> dependent variable or it might be an independent variable. There's a
> literature on estimating relationships in the presence of missing
data. To
> correctly estimate relationships (or, at least, come very close), you
> should
> use either multiple imputation or a maximum likelihood estimation
method
> that incorporates the EM algorithm. So far as I know, SPSS has
neither.
> The
> key person here is Donald Rubin. But, there are other, more recent
> articles.
>
> Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 18:33:35 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Can someone please tell me how to unsubscribe

FAQ: How to unsubscribe, or leave list SPSSX-L:

Requests to unsubscribe that are posted to the list, will never be
acted on.

You must send the request to [hidden email].

 From the E-mail address from which you're subscribed to the list, send
a message to [hidden email] with the following words in the
body of the message:

SIGNOFF SPSSX-L

Don't put anything else (your name, etc.) in the body of the message.

It should work. If it doesn't, go to the following Web page:

http://www.listserv.uga.edu/cgi-bin/wa?SUBED1=spssx-l&A=1

and unsubscribe from there.

...........................
More information:

When you subscribed to the list, you received a welcome message (I'm
copying it below) with instructions (including asking you to save it).

 From the welcome message:

>Your  subscription to  the SPSSX-L  list (SPSSX(r)  Discussion)
>has  been accepted.
>
>Please save this message for future  reference, [...]
>
>You may leave the list at any time by sending a "SIGNOFF SPSSX-L"
>command to [hidden email].

There are many other commands that can be sent to the same address, to
manage your subscription. If you send mail to [hidden email]
with the text

INFO REFCARD

and no other text, you will be mailed a file describing those commands.

------------------------------

Date:    Wed, 25 Jul 2007 16:00:22 -0700
From:    Karen Powers <[hidden email]>
Subject: RENAME LOOP?

Hello SPSS list,
I have a dataset in which the variable names are var002, var003   ...
var3477.

I would like to RENAME each variable with the number listed in row 1 of
each respective column.
The first number in var002 is 6951030.  I would like var002 renamed
"rs6951030".
The command for doing this once that runs nicely is:

RENAME VARS var002 = "rs"+ "6951030".
EXE.

Now I would like to do this for all vars through var3477.
I have tried DO REPEAT and LOOP commands but they both say
 >Warning # 141.  Command name: RENAME VARS
 >DO REPEAT has no effect on this command.

Any ideas on how I can do this (3477 times)?

Thanks, Karen

------------------------------

Date:    Wed, 25 Jul 2007 19:27:19 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Aggregating with missing data

At 03:30 AM 7/14/2007, Marco wrote:

>When using the aggregate (mean) function in SPSS, cells that contain
>missing data become empty. Thus, a cell that should contain the mean
>of multiple cells (one of which is empty/missing), turns to zero
>because it contains one missing datum.

I'm not sure what's happening to you, but you shouldn't be seeing what
you say you're seeing.

 From your description, it sounds like you're doing one of two things:
a) Using the MEAN function with command AGGREGATE to average over a set
of variables
b) Using the MEAN function in the transformation language to average
over a set of variables.

BOTH of those, however, ignore missing values when averaging, and take
the mean of the non-missing values; they don't make a value 0 because
there's a missing value in the list. (That would be a very dangerous
thing to do anyway.) So I'm not sure what's happening.

Could you post the syntax, some test data, what output you get, and
tell us what output you want?

.....................................
Here are demonstrations of averaging across cases with AGGREGATE, and
averaging across variables. It's SPSS 15 draft output (WRR-not saved
separately).
.....................................
Using AGGREGATE to average over cases:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
[Aggregate]

Group Value

     1     1
     1     2
     1     3
     2     4
     2     5
     2     .
     2     7
     3     .
     3     9
     3    10


Number of cases read:  10    Number of cases listed:  10


AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
Group Members     MEAN

     1       3     2.00
     2       4     5.33
     3       3     9.50

Number of cases read:  3    Number of cases listed:  3
.....................................
Using MEAN to average over variables:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:04       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4

     1      3       1       2       3       .
     2      4       4       5       .       7
     3      3       .       9      10       .


Number of cases read:  3    Number of cases listed:  3


NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:26       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4   Mean

     1      3       1       2       3       .    2.00
     2      4       4       5       .       7    5.33
     3      3       .       9      10       .    9.50

Number of cases read:  3    Number of cases listed:  3

===================
APPENDIX:  All code
===================
I keyed the test data into the Data Editor; however, it can be
recovered from the LIST output fairly easily. Here's all the code:

DATASET ACTIVATE TestData.
DATASET COPY     Aggregate.
DATASET ACTIVATE Aggregate WINDOW=FRONT.

LIST.

AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

DATASET ACTIVATE TestData.
DATASET COPY     Wide.
DATASET ACTIVATE Wide      WINDOW=FRONT.

SORT CASES BY Group .
CASESTOVARS
  /ID = Group
  /GROUPBY = VARIABLE
  /COUNT = Members "Size of group" .

LIST.

NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).

LIST.

------------------------------

Date:    Wed, 25 Jul 2007 20:24:54 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Matching files on one of three possible ID's

At 09:29 AM 7/12/2007, Daniel Robertson wrote:

>I have two database extracts that I am trying to merge, one of which
>contains enrolled students and the other contains approximately the
>same group of students when they were applicants. In the Enrollment
>file students are uniquely identified by 'enroll_id'. In the Applicant
>file there is a primary ID, 'applicant_id1', but there may be up to
>two other IDs which were issued and updated provisionally as the
>student was going through the application process. The rub is that
>'enroll_id' may match any one of the applicant IDs, not necessarily
>the primary one.

Gene's given you a workable solution. It requires sorting the data
three times;  but with three keys, something like that is inevitable.

You *can* combine the three sorting operations into one step, by using
XSAVE to create three copies of each Applicant record, one each in
which 'enroll_id' is loaded from each of the three candidate key
variables in the Applicant file. Then sort the resulting file by that
'enroll_id', MATCH FILES with the Enrollment file, and discard any
Applicant records that don't match.

Now, that's the simplest possible case. You may need logic in case,
say, the same ID value occurs in more than one of the Applicant-record
fields. But it's another way to go.

Sorry; no code this time.

------------------------------

End of SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
**************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

dnyeboa
Hi everyone,
I'm new to SPSS and I wanted to know if there is a way that SPSS keeps track of
what has been done to a variable. I know that you can paste your actions to
Syntax and thus save a program. But does any one know how one might retrieve
the programing of a variable is it was not initially pasted and saved?

Also, how do you find the "log" of your actions in SPSS?

Thank you very much for your time and assistance,
Debra

Quoting Melissa Ives <[hidden email]>:

> Hmm, I would think about looking into Version 16 which is about to come out
> in a few months and promises to be less 'quirky' than 15.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Chelminski, Iwona
> Sent: Thursday, July 26, 2007 10:52 AM
> To: [hidden email]
> Subject: Re: [SPSSX-L] SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007
> (#2007-207)
>
> Hi Group,
> I'm thinking about buying the 15 version. Is it any good? They want over
> $1,500 for it so i want to make sure that it's not a piece of crap like the
> version 10 was.
> Any comments?
> Thanks in advance
>
> Iwona
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
> Automatic digest processor
> Sent: Thursday, July 26, 2007 12:01 AM
> To: Recipients of SPSSX-L digests
> Subject: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
>
>
> There are 24 messages totalling 1629 lines in this issue.
>
> Topics of the day:
>
>   1. SPSS 15 loses reg. with Vista
>   2. ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
>   3. data dictionary (2)
>   4. include (2)
>   5. stepwise regression how to include all cases despite missing data (3)
>   6. Random date generator (6)
>   7. stepwise regression how to include all cases despite missing
>      data
>   8. Old Sunflower Option (2)
>   9. AW:      Re: Old Sunflower Option
>  10. Can someone please tell me how to unsubscribe from this forum, thanks
> in
>      advance
>  11. Can someone please tell me how to unsubscribe
>  12. RENAME LOOP?
>  13. Aggregating with missing data
>  14. Matching files on one of three possible ID's
>
> ----------------------------------------------------------------------
>
> Date:    Wed, 25 Jul 2007 04:36:06 -0400
> From:    Nico Munting <[hidden email]>
> Subject: Re: SPSS 15 loses reg. with Vista
>
> I hope you have solved this problem by now, but if you haven't you might
> try the following.
>
> I do not have experience with SPSS on Vista, but my guess is that SPSS is
> trying to store the registration information in C:\Program Files\ or the
> equivalent in your situation. However, because in Vista you are not running
> all programs as Administrator, SPSS does not have write-access to the
> Program Files directory.
>
> To give SPSS write-access to the Program Files directory right click on the
> shortcut you are using to launch SPSS and select "Run As...". Choose an
> account with Administrative privileges and you might be prompted for a
> password or a confirmation. After this SPSS should start normally. Now
> complete the registration, and since SPSS was started in Administrative
> mode, it should be able to write the registration information to the
> Program Files directory.
>
> If all is well you should be able to start SPSS normally from this point
> on, and hopefully it remembers your registration.
>
> Good luck,
>
> Nico
>
>
> On Wed, 4 Jul 2007 06:33:47 +0100, David Hitchin <[hidden email]>
> wrote:
>
> >I have installed SPSS 15.0, including the patch to take it to 15.0.1 and
> >the additional patch to cope with Vista problems. The new licence
> >procedure works, and SPSS functions as expected.
> >
> >At some point, either when SPSS is stopped or the machine is turned off,
> >the registration is lost; next time SPSS is started it has to be
> >re-registered.
> >
> >Any ideas?
> >
> >David Hitchin
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 10:14:57 +0100
> From:    Peter Watson <[hidden email]>
> Subject: ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
>
>   This message is in MIME format.  The first part should be readable text,
>   while the remaining parts are likely unreadable without MIME-aware tools.
>
> ---559023410-1804928587-1185354897=:4092
> Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
> Content-Transfer-Encoding: QUOTED-PRINTABLE
>
>
> A reminder of the ASSESS meeting in November:
>
> ASSESS: SPSS USERS' GROUP
> 21st ANNUAL MEETING
> FRIDAY 9th NOVEMBER 2007
> ALCUIN RESEARCH RESOURCE CENTRE AUDITORIUM=20
> UNIVERSITY OF YORK, YORK
>
>
> ASSESS is an independent user group for SPSS, a computer package for analys=
> ing=20
> and presenting data. It is run by users, for users and is completely indepe=
> ndent
>   of manufacturers of the software. The meeting is open to all users of SPS=
> S and=20
> to anyone interested in SPSS.
>
> Come along to:
> * hear SPSS users talk about applications,the problems and solutions
> * hear the latest news from SPSS UK staff about product developments,
>    and put your questions to them
> * question a panel of experts about particular problems
> * exchange ideas with other SPSS users
> * plan for an even better user group.
>
> The venue is the Alcuin Research Resource Centre (ARRC) on York University=
> =20
> campus located in Heslington, 2 miles to the south-east of the city centre.=
> =20
> It takes 10-15 minutes in a taxi from the railway station. The Number 4 bus=
> =20
> runs regularly to the University from York railway station (see=20
> http://www.yorkshiretravel.net/). Parking at the University is very difficu=
> lt.=20
> Location details are at http://www.york.ac.uk/np/maps/. Accommodation is=20
> bookable via tourist information on (01904) 621756 or (01904) 554455.
>
> THE PROVISIONAL PROGRAMME**
>
> * Welcome and introduction to meeting
>
> * SPSS company and product news; SPSS software demonstrations
>
> * How and why to document data for long-term storage, and what's special=20
> about GI (geographical) data? by Allan Reese, CEFAS
>
> * Making the world a better place with SPSS: analysing & predicting
> charity donor behaviour using SPSS Base
> by John Sauve-Rodd, Datapreneurs
>
> * Applications of OMS by Gilbert MacKenzie, University of Limerick
>
> * Multivariate aspects of testing the savannah hypothesis of shopping=20
> by Charles Dennis, Brunel University
>
> *  Mousing with SPSS: useful point and click=20
> by Frances Provan, University of Edinburgh
>
>
> * Users" Question Time and Clinic
>
> * Annual General Meeting of ASSESS.
>
> Registration and coffee will start at 10am. Papers and other events will ru=
> n=20
> from 10.30am to about 5.00pm. Morning coffee, lunch and afternoon tea are=
> =20
> included in the registration fee. A timetable will be e-mailed to delegates
>   in advance of the meeting.
>
>
>
>
> ______________________
> ** The titles and order of events are subject to amendment.
> ---------------------------------------------------------------------------=
> -----
>
>                               BOOKING FORM
>                         ASSESS : SPSS USERS' GROUP
>          Friday 9th November 2007, ARRC auditorium, University of York
>
> Important:
>
> Bookings will not be treated as firm until a cheque or official (company) o=
> rder,
> payable to ASSESS, is received. Payment possible by BACS. Details on reques=
> t.
>
>
> Name:  ______________________________ Tel: ____________________
> Email: ______________________________ Fax: ____________________
>
> Job Title:    ___________________________________
> Organization: ___________________________________
> Address:      ___________________________________________________
>                ___________________________________________________
>                _______________________   Postcode  _______________
>
> Strike out the sections which do not apply to you, or otherwise amend as
> appropriate:
>
> INDIVIDUAL BOOKING. Please reserve a place for me, at a cost of 65 GBP.
>
> CORPORATE BOOKING. (Enter the appropriate amounts)
>
> Please reserve ______ places, at a cost of =A3_____ (65 GBP, for the first =
> person,=20
> and 55 GBP for each subsequent person) .
>
> Names of attendees : 1. _______________________________________
> (for badges)         2. _______________________________________
>                       3. _______________________________________
>                       4. _______________________________________
>                       5. _______________________________________
>
> STUDENT (POST-GRADUATE) BOOKING. (Enclose photocopied evidence of status
> for 2007-2008 academic year). Please reserve for me one of the student
> places at a cost of 40 GBP.
>
> Specify vegetarian or other dietary requirements, if any:
> ________________________________________________________________
>
> Cheque or official order enclosed for _______GBP
>
> For official orders please also give here the number and address for invoic=
> ing:
> ________________________________________________________________
> ________________________________________________________________
> ________________________________________________________________
>
>
> (Please indicate if you require a receipt of payment)
>
> Return completed forms to: Peter Watson, MRC Cognition and Brain
> Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF.
>
> Telephone enquiries about bookings: 01223 355294 x801 (has an answerphone)
>
> E-mail enquiries about bookings: =[hidden email]
> (important: put "ASSESS" in the Subject field)
>
> ---559023410-1804928587-1185354897=:4092--
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 03:41:21 -0700
> From:    Albert-jan Roskam <[hidden email]>
> Subject: Re: data dictionary
>
> DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
> DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.
>
> Will display only the variables of interest.
>
>
> --- "Parry, James" <[hidden email]> wrote:
>
> > Hi Ken,
> >
> > Try File . . . Display Data File Information . . .
> > Working File (if you have the data open). I believe
> > the only way to limit the variables displayed in
> > this command is to physically drop the variables.
> >
> > -HTH
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion
> > [mailto:[hidden email]] On Behalf Of Ken
> > Wood
> > Sent: Tuesday, July 24, 2007 1:13 PM
> > To: [hidden email]
> > Subject: data dictionary
> >
> > How does one save (in order to print) the
> > information about all the variables in a given
> > dataset?  That is, the information (or selected
> > information) that one sees in the Variable View?
> >
>
>
> Cheers!
> Albert-Jan
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Did you know that 87.166253% of all statistics claim a precision of results
> that is not justified by the method employed? [HELMUT RICHTER]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
________________________________________________________________________________
____

> Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
> http://new.toolbar.yahoo.com/toolbar/features/mail/index.php
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 12:48:59 +0200
> From:    Maddalena Agonigi <[hidden email]>
> Subject: include
>
> I am new to SPSS programming and so any help will be deeply appreciated.
>
> I use INSERT in a Script
>
> INSERT FILE=3D'C:\Test.sps'
>   SYNTAX=3DBATCH Error=3DStop CD=3DYES.
>
> but  spss give me following message
>
> INSERT <qu=EC>FILE=3D'C:\Test.sps
> (0) Istruzione non valida.
>
> Thank you all
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 05:55:45 -0700
> From:    Albert-jan Roskam <[hidden email]>
> Subject: Re: include
>
> Hi,
>
> --What SPSS version are you using? INSERT has been
> implemented relatively recently (spss v13+ (?)). Did
> you try using INCLUDE already?
> --Are you absolutely sure the sps file is where you
> say it is?
>
> Albert-Jan
>
> --- Maddalena Agonigi <[hidden email]> wrote:
>
> > I am new to SPSS programming and so any help will be
> > deeply appreciated.
> >
> > I use INSERT in a Script
> >
> > INSERT FILE='C:\Test.sps'
> >   SYNTAX=BATCH Error=Stop CD=YES.
> >
> > but  spss give me following message
> >
> > INSERT <quì>FILE='C:\Test.sps
> > (0) Istruzione non valida.
> >
> > Thank you all
> >
>
>
> Cheers!
> Albert-Jan
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Did you know that 87.166253% of all statistics claim a precision of results
> that is not justified by the method employed? [HELMUT RICHTER]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
________________________________________________________________________________
____

> Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
> http://new.toolbar.yahoo.com/toolbar/features/mail/index.php
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 09:49:25 -0400
> From:    Ken Wood <[hidden email]>
> Subject: Re: data dictionary
>
> Thank you for the many suggestions.  For those interested, the suggestions I
> received are below.
>
>
>
>
>
>
> you can go to File Menu and choose the option display information
> about the current working file or an external file. You will have an output
> that you can save in .spo format or even copy to word, excel or any other
> software.
>
>
>
>
> Do Display Dictionary, then go to the Output, select the information
> displayed, and either print from there, or export to either RTF or
> Excel formats. Recent versions of SPSS provide information on
> variable names and structure separately from Value labels, which is
> a bit annoying.
>
>
>
> try this command in syntax:
>
> display dictionary.
>
>
>
>
> DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
> DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.
>
> Will display only the variables of interest.
>
>
>
>
> *OMS
> *  /SELECT TABLES
> *  /EXCEPTIF LABELS =['Notes']
> *  /DESTINATION
> *        Format = Text
> *        OUTFILE = "dictionary.txt".
> *DISPLAY DICTIONARY.
> *OMSEND.
>
>
>
> >
> > Try File . . . Display Data File Information . . .
> > Working File (if you have the data open). I believe
> > the only way to limit the variables displayed in
> > this command is to physically drop the variables.
> >
> >
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion
> > [mailto:[hidden email]] On Behalf Of Ken
> > Wood
> > Sent: Tuesday, July 24, 2007 1:13 PM
> > To: [hidden email]
> > Subject: data dictionary
> >
> > How does one save (in order to print) the
> > information about all the variables in a given
> > dataset?  That is, the information (or selected
> > information) that one sees in the Variable View?
> >
>
>
> Ken Wood, PhD
> Research Scientist
> KMRREC
> West Orange, NJ
> 973-243-6871
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 21:58:49 +0800
> From:    Hunna Watson <[hidden email]>
> Subject: stepwise regression how to include all cases despite missing data
>
> Hi all,=20
> =20
> I'm running a stepwise regression of organizational practices on =
> construction projects that predict project cost growth. I have data for =
> 115 projects, yet some organizational practices were not applicable on =
> some projects (in a random fashion). the missing data is obviously =
> purposeful and not due to not filling in questionnaires etc. spss =
> automatically excludes cases with any missing values, or wants to =
> substitute a value, so I end up with a regression being carried out on =
> 10 projects, obviously not useful. Any suggestions for syntax to include =
> all cases or suggestions to rectify this problem?
> =20
> Thanks in advance,=20
> Hunna Watson
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 09:12:37 -0500
> From:    Melissa Ives <[hidden email]>
> Subject: Re: Random date generator
>
> During the assessment process, interviewers are given these instructions
> for estimating a date when a client cannot remember specific days or
> months. Perhaps you could create a similar algorithm?
>
> Date Guidelines (d/e):  Use the following rules if the participant is
> unsure of the exact date:
> DAY: Use the 5th for the beginning of the month, 15th for the middle of
> the month, and 25th for the end of the month.
> MONTH: Use March for early in the year, July for middle of the year, and
> October for later in the year, but try to make it so the number of weeks
> is about right.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Syed Hashmi
> Sent: Thursday, July 19, 2007 9:48 PM
> To: [hidden email]
> Subject: [SPSSX-L] Random date generator
>
> Dear co-listers,
>
> A dataset that I'm analyzing has a set of dates for events (start and
> stop
> dates) as well as how long those events occured for.  The data for each
> date is in three variables (month, day, year). The years are pretty
> complete if they are filled in but the month and day might are sometimes
> listed as the exact month or date and other times they're listed as
> beginning, middle or end of the year (for the month variable) or the
> month (for the day variable).
>
> Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> duration) from which I can deduce the start and stop dates (startdt,
> stopdt).
> Unfortunately,  I have the complete start and stop date for about half
> the cases. The rest are missing either parts of one of the dates (eg.
> day) or for both.  If I have one of the dates and a duration, I can
> calculate the other date.
>
> The reason for this post is that there is a small subset of the
> population where I have the complete stop date but am missing the start
> day (I have the year and month) and am also missing the duration.  I had
> to come up with some way to impute a start date for these cases for
> analysis (which will be done with and without these specific cases).  I
> know that the event could not be more than a month long. Therefore, what
> I was planning on doing was based on the information I have, calculate
> the earliest possible start date (e_startdt) up to a month before the
> stop date and then randomly pick a date between e_startdt and the stop
> date.
>
> Therefore, my query here was this: how can I code for this. I have an
> idea of how to do it in SAS but since I'm working in SPSS that doesn't
> help much.  I'm assuming that it will be something simple like:
>
>      startdt = e_startdt + RANDOM_DAYS.
>
> where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> startdt, "days").
>
> So how would I go about doing this? I tried using the help files and all
> but couldn't come up with something that worked. Is this the best way to
> do this? Any other way that I can do this? Does it matter what kind of
> seeding I use for the random number generator?
>
> Thanks.
>
> - Shahrukh
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 11:13:35 -0300
> From:    Hector Maletta <[hidden email]>
> Subject: Re: stepwise regression how to include all cases despite missing
> data
>
>          Hunna,
>          Your use of the stepwise method for regression instead of running
> with all the variables at once is immaterial for your problem. What you need
> is some way of dealing with projects where some organizational practice does
> not apply.
>          You do not give many details about the variables, but I imagine
> that each organizational practice might be a dummy variable, either present
> or absent. In such case, you may posit its effect on costs not as a result
> of "choosing or not choosing it when it is adequate to choose it" but as a
> result of its mere presence. A project may benefit from a practice if (a)
> the practice is applicable and (b) it is actually used; otherwise the
> project does not benefit from that practice. The absence of a practice may
> thus be a result of deliberate choice or impossibility of application, but
> in either case it would result in its effect not being observed. In other
> words, you may (if your particular situation affords this interpretation)
> treat the "missing" cases as negative instances, as zeroes in the dummies,
> and proceed with the regression.
>          If this road is not conceptually adequate, you're in trouble.
>          Hector
>
>
>          -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Hunna Watson
> Sent: 25 July 2007 10:59
> To: [hidden email]
> Subject: stepwise regression how to include all cases despite missing data
>
>          Hi all,
>
>          I'm running a stepwise regression of organizational practices on
> construction projects that predict project cost growth. I have data for 115
> projects, yet some organizational practices were not applicable on some
> projects (in a random fashion). the missing data is obviously purposeful and
> not due to not filling in questionnaires etc. spss automatically excludes
> cases with any missing values, or wants to substitute a value, so I end up
> with a regression being carried out on 10 projects, obviously not useful.
> Any suggestions for syntax to include all cases or suggestions to rectify
> this problem?
>
>          Thanks in advance,
>          Hunna Watson
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 10:24:27 -0400
> From:    Mark A Davenport MADAVENP <[hidden email]>
> Subject: Re: stepwise regression how to include all cases despite missing
> data
>
> Hector,
>
> When I worked at ACT, Inc we often treated the student's school identifier
> this way, usually with great success.  Granted, we had many thousands of
> cases to draw from.  Hunna only has 115?  She is going to run out of cases
> pretty quickly, don't you think?
>
>
>
********************************************************************************
*******************************************************************************

> Mark A. Davenport Ph.D.
> Senior Research Analyst
> Office of Institutional Research
> The University of North Carolina at Greensboro
> 336.256.0395
> [hidden email]
>
> 'An approximate answer to the right question is worth a good deal more
> than an exact answer to an approximate question.' --a paraphrase of J. W.
> Tukey (1962)
>
>
>
>
>
>
> Hector Maletta <[hidden email]>
> Sent by: "SPSSX(r) Discussion" <[hidden email]>
> 07/25/2007 10:15 AM
> Please respond to
> Hector Maletta <[hidden email]>
>
>
> To
> [hidden email]
> cc
>
> Subject
> Re: stepwise regression how to include all cases despite missing    data
>
>
>
>
>
>
>          Hunna,
>          Your use of the stepwise method for regression instead of running
> with all the variables at once is immaterial for your problem. What you
> need
> is some way of dealing with projects where some organizational practice
> does
> not apply.
>          You do not give many details about the variables, but I imagine
> that each organizational practice might be a dummy variable, either
> present
> or absent. In such case, you may posit its effect on costs not as a result
> of "choosing or not choosing it when it is adequate to choose it" but as a
> result of its mere presence. A project may benefit from a practice if (a)
> the practice is applicable and (b) it is actually used; otherwise the
> project does not benefit from that practice. The absence of a practice may
> thus be a result of deliberate choice or impossibility of application, but
> in either case it would result in its effect not being observed. In other
> words, you may (if your particular situation affords this interpretation)
> treat the "missing" cases as negative instances, as zeroes in the dummies,
> and proceed with the regression.
>          If this road is not conceptually adequate, you're in trouble.
>          Hector
>
>
>          -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Hunna Watson
> Sent: 25 July 2007 10:59
> To: [hidden email]
> Subject: stepwise regression how to include all cases despite missing data
>
>          Hi all,
>
>          I'm running a stepwise regression of organizational practices on
> construction projects that predict project cost growth. I have data for
> 115
> projects, yet some organizational practices were not applicable on some
> projects (in a random fashion). the missing data is obviously purposeful
> and
> not due to not filling in questionnaires etc. spss automatically excludes
> cases with any missing values, or wants to substitute a value, so I end up
> with a regression being carried out on 10 projects, obviously not useful.
> Any suggestions for syntax to include all cases or suggestions to rectify
> this problem?
>
>          Thanks in advance,
>          Hunna Watson
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 11:36:49 -0300
> From:    Hector Maletta <[hidden email]>
> Subject: Re: stepwise regression how to include all cases despite missing
>          data
>
> Hunna:
>
> Even with a scale, the "missing" responses can be reinterpreted as saying
> "This practice was not effective in this case because -for one reason or
> another- it was not used". This is not quite clean conceptually, but is your
> only choice unless you put up with working with 10 cases.
>
> The problem, apparently, is in the design of the questionnaire, asking for
> the effectiveness of a practice that is not in universal use among the cases
> under analysis. In any case, a practice cannot have any effectiveness if it
> is not used, so I insist you can treat it as having zero effectiveness when
> it was not used.
>
> On the other hand, since your variables seem to be many, and your cases seem
> to be few, perhaps you should consider a more artisanal approach for
> identifying effective strategies instead of your alli-in-one regression.
> With 87 predictors and 115 cases you don't have a chance even without a
> single missing value.
>
>
>
> Hector
>
>
>
>
>
>
>
>   _____
>
> From: Hunna Watson [mailto:[hidden email]]
> Sent: 25 July 2007 11:36
> To: Hector Maletta
> Subject: FW: Re: stepwise regression how to include all cases despite
> missing data
>
>
>
> thanks for your reply, i've just come on board this project in the past two
> weeks, the data has been collected already and, this is essentially what
> happened though i'm simplifying it, respondents rated how effective the use
> of the strategy was for preventing cost growth in the form of work that had
> to be done again on the project, so I have data on a scale and no
> possibility for coding absent or present :S
>
>
>
> extra information....
>
>
>
> yes I know all the horrible things about stepwise, but it is the only
> suitable method I can think of to answer the research questions, I have just
> come on board the project in the last two weeks. The research is very
> exploratory and the topic hasn't been examined before. Data has been
> collected on many different predictor variables (design-related sources,
> subcontractor sources, site management sources, contract documentation, the
> list goes on and on - up to a terrible 87 predictors). There are 115
> projects, so each is a case if you like, and we want to first look at this
> data set (no options there), but after that we can merge it with another
> data set containing information on a further 160 projects. Some predictors
> weren't relevant to projects. for instance, some didn't use incentives, but
> we have ratings on scales of 1 to 5 (assessing raters perceptions of
> contribution of use of that method to costs) and we are seeking to predict
> costs from the predictor variables. IF a method wasn't applicable e.g., use
> of a particular incentive plan, it has been left blank on questionnaires. No
> logical ordering.
>
>          Hunna,
>          Your use of the stepwise method for regression instead of running
> with all the variables at once is immaterial for your problem. What you need
> is some way of dealing with projects where some organizational practice does
> not apply.
>          You do not give many details about the variables, but I imagine
> that each organizational practice might be a dummy variable, either present
> or absent. In such case, you may posit its effect on costs not as a result
> of "choosing or not choosing it when it is adequate to choose it" but as a
> result of its mere presence. A project may benefit from a practice if (a)
> the practice is applicable and (b) it is actually used; otherwise the
> project does not benefit from that practice. The absence of a practice may
> thus be a result of deliberate choice or impossibility of application, but
> in either case it would result in its effect not being observed. In other
> words, you may (if your particular situation affords this interpretation)
> treat the "missing" cases as negative instances, as zeroes in the dummies,
> and proceed with the regression.
>          If this road is not conceptually adequate, you're in trouble.
>          Hector
>
>
>          -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Hunna Watson
> Sent: 25 July 2007 10:59
> To: [hidden email]
> Subject: stepwise regression how to include all cases despite missing data
>
>          Hi all,
>
>          I'm running a stepwise regression of organizational practices on
> construction projects that predict project cost growth. I have data for 115
> projects, yet some organizational practices were not applicable on some
> projects (in a random fashion). the missing data is obviously purposeful and
> not due to not filling in questionnaires etc. spss automatically excludes
> cases with any missing values, or wants to substitute a value, so I end up
> with a regression being carried out on 10 projects, obviously not useful.
> Any suggestions for syntax to include all cases or suggestions to rectify
> this problem?
>
>          Thanks in advance,
>          Hunna Watson
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 11:48:56 -0400
> From:    "William B. Ware" <[hidden email]>
> Subject: Old Sunflower Option
>
> In the older versions of SPSS, there was something called the sunflower
> option to show multiple cases at the same point in a scatter plot.  Does
> anyone know how to do that in Version 14 and up?
>
> Bill
>
> __________________________________________________________________________
> William B. Ware, Professor                         Educational Psychology,
> CB# 3500                                       Measurement, and Evaluation
> University of North Carolina                         PHONE  (919)-962-7848
> Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
> Office:  118 Peabody Hall                            EMAIL: [hidden email]
> Adjunct Professor                                    School of Social Work
> __________________________________________________________________________
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 10:31:24 -0600
> From:    ViAnn Beadle <[hidden email]>
> Subject: Re: Old Sunflower Option
>
> The nearest equivalent I can think of to the old sunflower option is the
> binhex function in GPL. It groups together nearby points and then you use
> the summary.count function to count the number of hits within the bin to set
> the size of the point displayed. This produces a plot sometimes referred to
> as bubble plot.
>
> Here's some sample syntax which produces a "bubble" plot using the employee
> data.sav sample file. The SCALE command constrains the minimum point size to
> 5 pixels. IMHO, the default sizing creates really, really small 1 pixel
> points which look like dust on my monitor--so this gets around that. The
> color.interior fills in the points (defaults to circles). I think the
> default hollow points are ugly--so this gets around that.
>
>
> GGRAPH
>   /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
>   LISTWISE REPORTMISSING=NO
>   /GRAPHSPEC SOURCE=INLINE.
> BEGIN GPL
>  SOURCE: s=userSource(id("graphdataset"))
>  DATA: prevexp=col(source(s), name("prevexp"))
>  DATA: jobtime=col(source(s), name("jobtime"))
>  GUIDE: axis(dim(1), label("Previous Experience (months)"))
>  GUIDE: axis(dim(2), label("Months since Hire"))
>  SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
> ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
> size(summary.count()), color.interior(color.blue))
> END GPL.
>
> The IGRAPH procedure also provides a jittering option which nudges the
> points slightly apart by adding a small amount of random variation, but I
> think the GPL binhex approach works much better.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> William B. Ware
> Sent: Wednesday, July 25, 2007 9:49 AM
> To: [hidden email]
> Subject: Old Sunflower Option
>
> In the older versions of SPSS, there was something called the sunflower
> option to show multiple cases at the same point in a scatter plot.  Does
> anyone know how to do that in Version 14 and up?
>
> Bill
>
> __________________________________________________________________________
> William B. Ware, Professor                         Educational Psychology,
> CB# 3500                                       Measurement, and Evaluation
> University of North Carolina                         PHONE  (919)-962-7848
> Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
> Office:  118 Peabody Hall                            EMAIL: [hidden email]
> Adjunct Professor                                    School of Social Work
> __________________________________________________________________________
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 18:39:32 +0200
> From:    Georg Maubach <[hidden email]>
> Subject: AW:      Re: Old Sunflower Option
>
> Hi All,
>
> We tried to run the syntax example on our "Employee data.sav" file. As we
> have a German version the variable names were translated to German. Does
> anybody know how we could obtain sample files in English?
>
> Best regards
>
> Georg Maubach
> Research Manager
>
>
> -----Ursprüngliche Nachricht-----
> Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von
> ViAnn Beadle
> Gesendet: Mittwoch, 25. Juli 2007 18:31
> An: [hidden email]
> Betreff: Re: Old Sunflower Option
>
> The nearest equivalent I can think of to the old sunflower option is the
> binhex function in GPL. It groups together nearby points and then you use the
> summary.count function to count the number of hits within the bin to set the
> size of the point displayed. This produces a plot sometimes referred to as
> bubble plot.
>
> Here's some sample syntax which produces a "bubble" plot using the employee
> data.sav sample file. The SCALE command constrains the minimum point size to
> 5 pixels. IMHO, the default sizing creates really, really small 1 pixel
> points which look like dust on my monitor--so this gets around that. The
> color.interior fills in the points (defaults to circles). I think the default
> hollow points arre ugly--so this gets around that.
>
>
> GGRAPH
>   /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
>   LISTWISE REPORTMISSING=NO
>   /GRAPHSPEC SOURCE=INLINE.
> BEGIN GPL
>  SOURCE: s=userSource(id("graphdataset"))
>  DATA: prevexp=col(source(s), name("prevexp"))
>  DATA: jobtime=col(source(s), name("jobtime"))
>  GUIDE: axis(dim(1), label("Previous Experience (months)"))
>  GUIDE: axis(dim(2), label("Months since Hire"))
>  SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
> ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
> size(summary.count()), color.interior(color.blue)) END GPL.
>
> The IGRAPH procedure also provides a jittering option which nudges the points
> slightly apart by adding a small amount of random variation, but I think the
> GPL binhex approach works much better.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> William B. Ware
> Sent: Wednesday, July 25, 2007 9:49 AM
> To: [hidden email]
> Subject: Old Sunflower Option
>
> In the older versions of SPSS, there was something called the sunflower
> option to show multiple cases at the same point in a scatter plot.  Does
> anyone know how to do that in Version 14 and up?
>
> Bill
>
> __________________________________________________________________________
> William B. Ware, Professor                         Educational Psychology,
> CB# 3500                                       Measurement, and Evaluation
> University of North Carolina                         PHONE  (919)-962-7848
> Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
> Office:  118 Peabody Hall                            EMAIL: [hidden email]
> Adjunct Professor                                    School of Social Work
> __________________________________________________________________________
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 13:28:22 -0500
> From:    "Hashmi, Syed S" <[hidden email]>
> Subject: Re: Random date generator
>
> Thanks Melissa,
>
> I'm already doing something similar to what you said, using 1st, 10th
> and 20th as the dates for start, middle and end of the month.  The Month
> is a bit trickier since my exposures are events during pregnancy so I
> have to be careful about just assigning a random month lest it falls
> outside the pregnancy duration.
>
> They question I had asked concerned dates which had month and year but
> no day information - not even beginning, middle or end.  Therefore, I
> couldn't even depend on the 1st-10th-20th coding.
>
> My final solution for the problem where I had a start date and had a
> stop month was to pick a random date between the start date (or the
> first date of the stop month) and the last day of the stop month.  Gene
> Maguin had emailed me earlier and suggested I use the UNIFORM function
> to randomly select a date.  I don't think that message was posted on the
> list-serv, so the body is copied below.  I've used the function since
> and it works nicely.
>
> Thanks again for your help though. I agree that there should be some
> sort of algorithm in place at the interviewer level to minimize the
> frequency of incomplete data.
>
> - S. Hashmi
>
> *copy of email from Gene*
>
> > -----Original Message-----
> > From: Gene Maguin [mailto:[hidden email]]
> > Sent: Friday, July 20, 2007 8:05 AM
> > To: Hashmi, Syed S
> > Subject: RE: Random date generator
> >
> > Syed,
> >
> > I'd like to be helpful to you but I don't have time to make up a full
> > solution. I think this would be a valid example of your question.
> >
> > Start date (mm/dd/yyyy): 5/x/2004
> > Stop date (mm/dd/yyyy):  6/17/2004
> > Possible duration range (6/17/2004)-(5/31/2004)=17 days to
> > (6/17/2004)-(5/18/2004)=30 days (I assume a 30 day month)
> >
> > So x has to be between 18 and 31 inclusive.
> >
> > So I think the trick to the random draw is this command.
> >
> > Compute x=uniform(14).
> > Compute x=trunc(x).
> >
> > Check this but I'm pretty sure that the range of x will be 0 to 13.
> > Your actual date is then > Compute x=x+18.
> >
> > There's lots of big 'little bits' to tidy up but this will get you
> want
> > you > want when the tidying up has been done.
> >
> > Best wishes, Gene Maguin
> >
>
>
> > -----Original Message-----
> > From: Melissa Ives [mailto:[hidden email]]
> > Sent: Wednesday, July 25, 2007 9:13 AM
> > To: Hashmi, Syed S; [hidden email]
> > Subject: RE: [SPSSX-L] Random date generator
> >
> > During the assessment process, interviewers are given these
> instructions
> > for estimating a date when a client cannot remember specific days or
> > months. Perhaps you could create a similar algorithm?
> >
> > Date Guidelines (d/e):  Use the following rules if the participant is
> > unsure of the exact date:
> > DAY: Use the 5th for the beginning of the month, 15th for the middle
> of
> > the month, and 25th for the end of the month.
> > MONTH: Use March for early in the year, July for middle of the year,
> and
> > October for later in the year, but try to make it so the number of
> weeks
> > is about right.
> >
> > Melissa
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of
> > Syed Hashmi
> > Sent: Thursday, July 19, 2007 9:48 PM
> > To: [hidden email]
> > Subject: [SPSSX-L] Random date generator
> >
> > Dear co-listers,
> >
> > A dataset that I'm analyzing has a set of dates for events (start and
> > stop
> > dates) as well as how long those events occured for.  The data for
> each
> > date is in three variables (month, day, year). The years are pretty
> > complete if they are filled in but the month and day might are
> sometimes
> > listed as the exact month or date and other times they're listed as
> > beginning, middle or end of the year (for the month variable) or the
> > month (for the day variable).
> >
> > Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> > duration) from which I can deduce the start and stop dates (startdt,
> > stopdt).
> > Unfortunately,  I have the complete start and stop date for about half
> > the cases. The rest are missing either parts of one of the dates (eg.
> > day) or for both.  If I have one of the dates and a duration, I can
> > calculate the other date.
> >
> > The reason for this post is that there is a small subset of the
> > population where I have the complete stop date but am missing the
> start
> > day (I have the year and month) and am also missing the duration.  I
> had
> > to come up with some way to impute a start date for these cases for
> > analysis (which will be done with and without these specific cases).
> I
> > know that the event could not be more than a month long. Therefore,
> what
> > I was planning on doing was based on the information I have, calculate
> > the earliest possible start date (e_startdt) up to a month before the
> > stop date and then randomly pick a date between e_startdt and the stop
> > date.
> >
> > Therefore, my query here was this: how can I code for this. I have an
> > idea of how to do it in SAS but since I'm working in SPSS that doesn't
> > help much.  I'm assuming that it will be something simple like:
> >
> >      startdt = e_startdt + RANDOM_DAYS.
> >
> > where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> > startdt, "days").
> >
> > So how would I go about doing this? I tried using the help files and
> all
> > but couldn't come up with something that worked. Is this the best way
> to
> > do this? Any other way that I can do this? Does it matter what kind of
> > seeding I use for the random number generator?
> >
> > Thanks.
> >
> > - Shahrukh
> >
> >
> > PRIVILEGED AND CONFIDENTIAL INFORMATION
> > This transmittal and any attachments may contain PRIVILEGED AND
> > CONFIDENTIAL information and is intended only for the use of the
> > addressee. If you are not the designated recipient, or an employee
> > or agent authorized to deliver such transmittals to the designated
> > recipient, you are hereby notified that any dissemination,
> > copying or publication of this transmittal is strictly prohibited. If
> > you have received this transmittal in error, please notify us
> > immediately by replying to the sender and delete this copy from your
> > system. You may also call us at (309) 827-6026 for assistance.
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 15:05:07 -0400
> From:    Richard Ristow <[hidden email]>
> Subject: Re: Random date generator
>
> Somehow I missed or deleted the original posting in this thread.
> Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,
>
> >A dataset that I'm analyzing has a set of dates for events (start and
> >stop dates) as well as how long those events occured for.  The data
> >for each date is in three variables (month, day, year). The years are
> >pretty complete if they are filled in but the month and day might are
> >sometimes listed as the exact month or date and other times they're
> >listed as beginning, middle or end of the year (for the month
> >variable) or the month (for the day variable).
> >
> >I have [two dates as three variables each, plus a duration] duration).
> >I have the complete start and stop date for about half the cases. The
> >rest are missing either parts of one of the dates (eg. day) or for
> >both.  If I have one of the dates and a duration, I can calculate the
> >other date.
>
> So far, so good, though be careful about how precise your 'durations'
> are.
>
> >There is a small subset of the population where I have the complete
> >stop date but am missing the start day (I have the year and month) and
> >am also missing the duration.  I had to come up with some way to
> >impute a start date for these cases for analysis. (which will be done
> >with and without these specific cases).  I know that the event could
> >not be more than a month long. I was planning calculate the earliest
> >possible start date (e_startdt) up to a month before the stop date and
> >then randomly pick a date between e_startdt and the stop date.
>
> OUCH! I would not do this. Period.
>
> *MAYBE* the start dates and durations you get this way will be vaguely
> representative of the population of events, though I doubt it. Are your
> durations roughly uniformly distributed from 0 to 30 days? For goodness
> sake, you ought to check that before proceeding.
>
> But even if they're representative of the population, they have nothing
> to do with the individual cases for which they're 'imputed'. No
> analysis using those 'dates' will be the least trustworthy.
>
> A far better approach is to use true missing-value interpolation on the
> *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> durations you'd have to impute. If it's near 50%, that won't be at all
> reliable, either.
>
> -Good luck,
>   Richard
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 15:09:53 -0500
> From:    "Hashmi, Syed S" <[hidden email]>
> Subject: Re: Random date generator
>
> > -----Original Message-----
> > From: Richard Ristow [mailto:[hidden email]]
> > Sent: Wednesday, July 25, 2007 2:05 PM
> >
> > >There is a small subset of the population where I have the complete
> > >stop date but am missing the start day (I have the year and month)
> and
> > >am also missing the duration.  I had to come up with some way to
> > >impute a start date for these cases for analysis. (which will be done
> > >with and without these specific cases).  I know that the event could
> > >not be more than a month long. I was planning calculate the earliest
> > >possible start date (e_startdt) up to a month before the stop date
> and
> > >then randomly pick a date between e_startdt and the stop date.
> >
> > OUCH! I would not do this. Period.
> >
> > *MAYBE* the start dates and durations you get this way will be vaguely
> > representative of the population of events, though I doubt it. Are
> your
> > durations roughly uniformly distributed from 0 to 30 days? For
> goodness
> > sake, you ought to check that before proceeding.
> >
> > But even if they're representative of the population, they have
> nothing
> > to do with the individual cases for which they're 'imputed'. No
> > analysis using those 'dates' will be the least trustworthy.
> >
> > A far better approach is to use true missing-value interpolation on
> the
> > *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> > durations you'd have to impute. If it's near 50%, that won't be at all
> > reliable, either.
> >
> > -Good luck,
> >   Richard
>
>
> Richard,
>
> Thanks for your input.  I realize that I was stepping into extremely
> treacherous territory when I decide to impute dates and select random
> ones.  As for the durations being roughly uniformly distributed, that's
> what it looks like from the data I do have.  Initially, I'd assumed that
> durations would have a mean of about 7 days but somehow the data I do
> have doesn't seem to show that.  It's more or less uniformly
> distributed.  There were some durations that were >30 days but I doubt
> if they're true.  Therefore, I decided to go ahead with the uniform
> distribution (although, the whole imputation and random selection still
> bothers me).
>
> The reason that I'm trying to get an idea about the dates, especially
> the event start dates, is due to the nature of the study question. I'm
> looking at the occurrence of certain events during pregnancy.  However,
> these events of interest have to occur within the first trimester, or if
> I narrow it down further, the first two months of pregnancy.  Therefore,
> I have to know if an event occurred within a certain period of time
> after the last menstrual date as reported by the woman.  At the end of
> the day, the variables for all the events get filtered down to a single
> dichotomous variable - Y/N did the event occur during the period of
> interest?
>
> I will do the analysis with and without the cases where the dates have
> been imputed from incomplete data.  I hadn't previously thought of using
> true-missing value interpolation on the durations but I'll look into it.
> I've never done that before so will have to read up a bit on it.  I
> might have an issue with number of missings though, since more cases
> have at least some part of the date then a duration value.
>
> Thanks again for your advice. It's always nice to get a fresh look at an
> issue.
>
> - Shahurkh
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 16:24:09 -0400
> From:    Gene Maguin <[hidden email]>
> Subject: Re: Random date generator
>
> Syed,
>
> It sounds like you are going to use the imputed dates to decide if something
> happended or not. The new variable, 'something happened or not' might be a
> dependent variable or it might be an independent variable. There's a
> literature on estimating relationships in the presence of missing data. To
> correctly estimate relationships (or, at least, come very close), you should
> use either multiple imputation or a maximum likelihood estimation method
> that incorporates the EM algorithm. So far as I know, SPSS has neither. The
> key person here is Donald Rubin. But, there are other, more recent articles.
>
> Gene Maguin
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 20:43:06 +0000
> From:    Hamish Travers <[hidden email]>
> Subject: Can someone please tell me how to unsubscribe from this forum,
>          thanks in advance
>
> _________________________________________________________________
> Invite your mail contacts to join your friends list with Windows Live Space=
> s. It's easy!
> http://spaces.live.com/spacesapi.aspx?wx_action=3Dcreate&wx_url=3D/friends.=
> aspx&mkt=3Den-us=
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 16:41:18 -0500
> From:    "Hashmi, Syed S" <[hidden email]>
> Subject: Re: Random date generator
>
> Thanks Gene,
>
> After the comments that you and Richard made I'm thinking real hard of
> rethinking the whole thing.  Maximum likelihood estimation was something
> that I had thought of initially but didn't follow up on.  I guess it's
> time that I do.  Thanks again for your help.
>
> - Shahrukh
>
>
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
> Of
> > Gene Maguin
> > Sent: Wednesday, July 25, 2007 3:24 PM
> > To: [hidden email]
> > Subject: Re: Random date generator
> >
> > Syed,
> >
> > It sounds like you are going to use the imputed dates to decide if
> > something
> > happended or not. The new variable, 'something happened or not' might
> be a
> > dependent variable or it might be an independent variable. There's a
> > literature on estimating relationships in the presence of missing
> data. To
> > correctly estimate relationships (or, at least, come very close), you
> > should
> > use either multiple imputation or a maximum likelihood estimation
> method
> > that incorporates the EM algorithm. So far as I know, SPSS has
> neither.
> > The
> > key person here is Donald Rubin. But, there are other, more recent
> > articles.
> >
> > Gene Maguin
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 18:33:35 -0400
> From:    Richard Ristow <[hidden email]>
> Subject: Re: Can someone please tell me how to unsubscribe
>
> FAQ: How to unsubscribe, or leave list SPSSX-L:
>
> Requests to unsubscribe that are posted to the list, will never be
> acted on.
>
> You must send the request to [hidden email].
>
>  From the E-mail address from which you're subscribed to the list, send
> a message to [hidden email] with the following words in the
> body of the message:
>
> SIGNOFF SPSSX-L
>
> Don't put anything else (your name, etc.) in the body of the message.
>
> It should work. If it doesn't, go to the following Web page:
>
> http://www.listserv.uga.edu/cgi-bin/wa?SUBED1=spssx-l&A=1
>
> and unsubscribe from there.
>
> ...........................
> More information:
>
> When you subscribed to the list, you received a welcome message (I'm
> copying it below) with instructions (including asking you to save it).
>
>  From the welcome message:
>
> >Your  subscription to  the SPSSX-L  list (SPSSX(r)  Discussion)
> >has  been accepted.
> >
> >Please save this message for future  reference, [...]
> >
> >You may leave the list at any time by sending a "SIGNOFF SPSSX-L"
> >command to [hidden email].
>
> There are many other commands that can be sent to the same address, to
> manage your subscription. If you send mail to [hidden email]
> with the text
>
> INFO REFCARD
>
> and no other text, you will be mailed a file describing those commands.
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 16:00:22 -0700
> From:    Karen Powers <[hidden email]>
> Subject: RENAME LOOP?
>
> Hello SPSS list,
> I have a dataset in which the variable names are var002, var003   ...
> var3477.
>
> I would like to RENAME each variable with the number listed in row 1 of
> each respective column.
> The first number in var002 is 6951030.  I would like var002 renamed
> "rs6951030".
> The command for doing this once that runs nicely is:
>
> RENAME VARS var002 = "rs"+ "6951030".
> EXE.
>
> Now I would like to do this for all vars through var3477.
> I have tried DO REPEAT and LOOP commands but they both say
>  >Warning # 141.  Command name: RENAME VARS
>  >DO REPEAT has no effect on this command.
>
> Any ideas on how I can do this (3477 times)?
>
> Thanks, Karen
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 19:27:19 -0400
> From:    Richard Ristow <[hidden email]>
> Subject: Re: Aggregating with missing data
>
> At 03:30 AM 7/14/2007, Marco wrote:
>
> >When using the aggregate (mean) function in SPSS, cells that contain
> >missing data become empty. Thus, a cell that should contain the mean
> >of multiple cells (one of which is empty/missing), turns to zero
> >because it contains one missing datum.
>
> I'm not sure what's happening to you, but you shouldn't be seeing what
> you say you're seeing.
>
>  From your description, it sounds like you're doing one of two things:
> a) Using the MEAN function with command AGGREGATE to average over a set
> of variables
> b) Using the MEAN function in the transformation language to average
> over a set of variables.
>
> BOTH of those, however, ignore missing values when averaging, and take
> the mean of the non-missing values; they don't make a value 0 because
> there's a missing value in the list. (That would be a very dangerous
> thing to do anyway.) So I'm not sure what's happening.
>
> Could you post the syntax, some test data, what output you get, and
> tell us what output you want?
>
> .....................................
> Here are demonstrations of averaging across cases with AGGREGATE, and
> averaging across variables. It's SPSS 15 draft output (WRR-not saved
> separately).
> .....................................
> Using AGGREGATE to average over cases:
> List
> |-----------------------------|---------------------------|
> |Output Created               |25-JUL-2007 19:16:03       |
> |-----------------------------|---------------------------|
> [Aggregate]
>
> Group Value
>
>      1     1
>      1     2
>      1     3
>      2     4
>      2     5
>      2     .
>      2     7
>      3     .
>      3     9
>      3    10
>
>
> Number of cases read:  10    Number of cases listed:  10
>
>
> AGGREGATE OUTFILE=*
>    /BREAK=GROUP
>    /Members 'Size of group' = NU
>    /MEAN    'Mean of "value"' = MEAN(VALUE).
>
> LIST.
>
> List
> |-----------------------------|---------------------------|
> |Output Created               |25-JUL-2007 19:16:03       |
> |-----------------------------|---------------------------|
> Group Members     MEAN
>
>      1       3     2.00
>      2       4     5.33
>      3       3     9.50
>
> Number of cases read:  3    Number of cases listed:  3
> .....................................
> Using MEAN to average over variables:
> List
> |-----------------------------|---------------------------|
> |Output Created               |25-JUL-2007 19:16:04       |
> |-----------------------------|---------------------------|
> [Wide]
>
> Group Members Value.1 Value.2 Value.3 Value.4
>
>      1      3       1       2       3       .
>      2      4       4       5       .       7
>      3      3       .       9      10       .
>
>
> Number of cases read:  3    Number of cases listed:  3
>
>
> NUMERIC Mean (F6.2).
> COMPUTE Mean = MEAN(Value.1 TO Value.4).
> LIST.
>
> List
> |-----------------------------|---------------------------|
> |Output Created               |25-JUL-2007 19:16:26       |
> |-----------------------------|---------------------------|
> [Wide]
>
> Group Members Value.1 Value.2 Value.3 Value.4   Mean
>
>      1      3       1       2       3       .    2.00
>      2      4       4       5       .       7    5.33
>      3      3       .       9      10       .    9.50
>
> Number of cases read:  3    Number of cases listed:  3
>
> ===================
> APPENDIX:  All code
> ===================
> I keyed the test data into the Data Editor; however, it can be
> recovered from the LIST output fairly easily. Here's all the code:
>
> DATASET ACTIVATE TestData.
> DATASET COPY     Aggregate.
> DATASET ACTIVATE Aggregate WINDOW=FRONT.
>
> LIST.
>
> AGGREGATE OUTFILE=*
>    /BREAK=GROUP
>    /Members 'Size of group' = NU
>    /MEAN    'Mean of "value"' = MEAN(VALUE).
>
> LIST.
>
> DATASET ACTIVATE TestData.
> DATASET COPY     Wide.
> DATASET ACTIVATE Wide      WINDOW=FRONT.
>
> SORT CASES BY Group .
> CASESTOVARS
>   /ID = Group
>   /GROUPBY = VARIABLE
>   /COUNT = Members "Size of group" .
>
> LIST.
>
> NUMERIC Mean (F6.2).
> COMPUTE Mean = MEAN(Value.1 TO Value.4).
>
> LIST.
>
> ------------------------------
>
> Date:    Wed, 25 Jul 2007 20:24:54 -0400
> From:    Richard Ristow <[hidden email]>
> Subject: Re: Matching files on one of three possible ID's
>
> At 09:29 AM 7/12/2007, Daniel Robertson wrote:
>
> >I have two database extracts that I am trying to merge, one of which
> >contains enrolled students and the other contains approximately the
> >same group of students when they were applicants. In the Enrollment
> >file students are uniquely identified by 'enroll_id'. In the Applicant
> >file there is a primary ID, 'applicant_id1', but there may be up to
> >two other IDs which were issued and updated provisionally as the
> >student was going through the application process. The rub is that
> >'enroll_id' may match any one of the applicant IDs, not necessarily
> >the primary one.
>
> Gene's given you a workable solution. It requires sorting the data
> three times;  but with three keys, something like that is inevitable.
>
> You *can* combine the three sorting operations into one step, by using
> XSAVE to create three copies of each Applicant record, one each in
> which 'enroll_id' is loaded from each of the three candidate key
> variables in the Applicant file. Then sort the resulting file by that
> 'enroll_id', MATCH FILES with the Enrollment file, and discard any
> Applicant records that don't match.
>
> Now, that's the simplest possible case. You may need logic in case,
> say, the same ID value occurs in more than one of the Applicant-record
> fields. But it's another way to go.
>
> Sorry; no code this time.
>
> ------------------------------
>
> End of SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
> **************************************************************
>
>


--
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Lemon, John S.
In reply to this post by Melissa Ives
I would agree that if you are upgrading then wait for 16 which will have an improved interface; the ability to read and write Office 2007 files plus all the usual improvements from one version to the next. If the final version is as good as the beta then it will be great, and of course the final version should be even better.

If I was investing that amount of money I'd wait for 16. I believe it should be out Q3 2007.

Best Wishes

John S. Lemon
DIT - University of Aberdeen
Edward Wright Building: Room G51
Tel:  +44 1224 273350
Fax: +44 1224 273372


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Melissa Ives
Sent: 26 July 2007 21:11
To: [hidden email]
Subject: Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Hmm, I would think about looking into Version 16 which is about to come out in a few months and promises to be less 'quirky' than 15.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Chelminski, Iwona
Sent: Thursday, July 26, 2007 10:52 AM
To: [hidden email]
Subject: Re: [SPSSX-L] SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Hi Group,
I'm thinking about buying the 15 version. Is it any good? They want over $1,500 for it so i want to make sure that it's not a piece of crap like the version 10 was.
Any comments?
Thanks in advance

Iwona


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Automatic digest processor
Sent: Thursday, July 26, 2007 12:01 AM
To: Recipients of SPSSX-L digests
Subject: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)


There are 24 messages totalling 1629 lines in this issue.

Topics of the day:

  1. SPSS 15 loses reg. with Vista
  2. ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER
  3. data dictionary (2)
  4. include (2)
  5. stepwise regression how to include all cases despite missing data (3)
  6. Random date generator (6)
  7. stepwise regression how to include all cases despite missing
     data
  8. Old Sunflower Option (2)
  9. AW:      Re: Old Sunflower Option
 10. Can someone please tell me how to unsubscribe from this forum, thanks in
     advance
 11. Can someone please tell me how to unsubscribe
 12. RENAME LOOP?
 13. Aggregating with missing data
 14. Matching files on one of three possible ID's

----------------------------------------------------------------------

Date:    Wed, 25 Jul 2007 04:36:06 -0400
From:    Nico Munting <[hidden email]>
Subject: Re: SPSS 15 loses reg. with Vista

I hope you have solved this problem by now, but if you haven't you might
try the following.

I do not have experience with SPSS on Vista, but my guess is that SPSS is
trying to store the registration information in C:\Program Files\ or the
equivalent in your situation. However, because in Vista you are not running
all programs as Administrator, SPSS does not have write-access to the
Program Files directory.

To give SPSS write-access to the Program Files directory right click on the
shortcut you are using to launch SPSS and select "Run As...". Choose an
account with Administrative privileges and you might be prompted for a
password or a confirmation. After this SPSS should start normally. Now
complete the registration, and since SPSS was started in Administrative
mode, it should be able to write the registration information to the
Program Files directory.

If all is well you should be able to start SPSS normally from this point
on, and hopefully it remembers your registration.

Good luck,

Nico


On Wed, 4 Jul 2007 06:33:47 +0100, David Hitchin <[hidden email]>
wrote:

>I have installed SPSS 15.0, including the patch to take it to 15.0.1 and
>the additional patch to cope with Vista problems. The new licence
>procedure works, and SPSS functions as expected.
>
>At some point, either when SPSS is stopped or the machine is turned off,
>the registration is lost; next time SPSS is started it has to be
>re-registered.
>
>Any ideas?
>
>David Hitchin

------------------------------

Date:    Wed, 25 Jul 2007 10:14:57 +0100
From:    Peter Watson <[hidden email]>
Subject: ANNOUNCE: SPSS USERS MEETING (INC BOOKING FORM) REMINDER

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1804928587-1185354897=:4092
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


A reminder of the ASSESS meeting in November:

ASSESS: SPSS USERS' GROUP
21st ANNUAL MEETING
FRIDAY 9th NOVEMBER 2007
ALCUIN RESEARCH RESOURCE CENTRE AUDITORIUM=20
UNIVERSITY OF YORK, YORK


ASSESS is an independent user group for SPSS, a computer package for analys=
ing=20
and presenting data. It is run by users, for users and is completely indepe=
ndent
  of manufacturers of the software. The meeting is open to all users of SPS=
S and=20
to anyone interested in SPSS.

Come along to:
* hear SPSS users talk about applications,the problems and solutions
* hear the latest news from SPSS UK staff about product developments,
   and put your questions to them
* question a panel of experts about particular problems
* exchange ideas with other SPSS users
* plan for an even better user group.

The venue is the Alcuin Research Resource Centre (ARRC) on York University=
=20
campus located in Heslington, 2 miles to the south-east of the city centre.=
=20
It takes 10-15 minutes in a taxi from the railway station. The Number 4 bus=
=20
runs regularly to the University from York railway station (see=20
http://www.yorkshiretravel.net/). Parking at the University is very difficu=
lt.=20
Location details are at http://www.york.ac.uk/np/maps/. Accommodation is=20
bookable via tourist information on (01904) 621756 or (01904) 554455.

THE PROVISIONAL PROGRAMME**

* Welcome and introduction to meeting

* SPSS company and product news; SPSS software demonstrations

* How and why to document data for long-term storage, and what's special=20
about GI (geographical) data? by Allan Reese, CEFAS

* Making the world a better place with SPSS: analysing & predicting
charity donor behaviour using SPSS Base
by John Sauve-Rodd, Datapreneurs

* Applications of OMS by Gilbert MacKenzie, University of Limerick

* Multivariate aspects of testing the savannah hypothesis of shopping=20
by Charles Dennis, Brunel University

*  Mousing with SPSS: useful point and click=20
by Frances Provan, University of Edinburgh


* Users" Question Time and Clinic

* Annual General Meeting of ASSESS.

Registration and coffee will start at 10am. Papers and other events will ru=
n=20
from 10.30am to about 5.00pm. Morning coffee, lunch and afternoon tea are=
=20
included in the registration fee. A timetable will be e-mailed to delegates
  in advance of the meeting.




______________________
** The titles and order of events are subject to amendment.
---------------------------------------------------------------------------=
-----

                              BOOKING FORM
                        ASSESS : SPSS USERS' GROUP
         Friday 9th November 2007, ARRC auditorium, University of York

Important:

Bookings will not be treated as firm until a cheque or official (company) o=
rder,
payable to ASSESS, is received. Payment possible by BACS. Details on reques=
t.


Name:  ______________________________ Tel: ____________________
Email: ______________________________ Fax: ____________________

Job Title:    ___________________________________
Organization: ___________________________________
Address:      ___________________________________________________
               ___________________________________________________
               _______________________   Postcode  _______________

Strike out the sections which do not apply to you, or otherwise amend as
appropriate:

INDIVIDUAL BOOKING. Please reserve a place for me, at a cost of 65 GBP.

CORPORATE BOOKING. (Enter the appropriate amounts)

Please reserve ______ places, at a cost of =A3_____ (65 GBP, for the first =
person,=20
and 55 GBP for each subsequent person) .

Names of attendees : 1. _______________________________________
(for badges)         2. _______________________________________
                      3. _______________________________________
                      4. _______________________________________
                      5. _______________________________________

STUDENT (POST-GRADUATE) BOOKING. (Enclose photocopied evidence of status
for 2007-2008 academic year). Please reserve for me one of the student
places at a cost of 40 GBP.

Specify vegetarian or other dietary requirements, if any:
________________________________________________________________

Cheque or official order enclosed for _______GBP

For official orders please also give here the number and address for invoic=
ing:
________________________________________________________________
________________________________________________________________
________________________________________________________________


(Please indicate if you require a receipt of payment)

Return completed forms to: Peter Watson, MRC Cognition and Brain
Sciences Unit, 15 Chaucer Road, Cambridge, CB2 7EF.

Telephone enquiries about bookings: 01223 355294 x801 (has an answerphone)

E-mail enquiries about bookings: =[hidden email]
(important: put "ASSESS" in the Subject field)

---559023410-1804928587-1185354897=:4092--

------------------------------

Date:    Wed, 25 Jul 2007 03:41:21 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: data dictionary

DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.


--- "Parry, James" <[hidden email]> wrote:

> Hi Ken,
>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
> -HTH
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 12:48:59 +0200
From:    Maddalena Agonigi <[hidden email]>
Subject: include

I am new to SPSS programming and so any help will be deeply appreciated.

I use INSERT in a Script

INSERT FILE=3D'C:\Test.sps'
  SYNTAX=3DBATCH Error=3DStop CD=3DYES.

but  spss give me following message

INSERT <qu=EC>FILE=3D'C:\Test.sps
(0) Istruzione non valida.

Thank you all

------------------------------

Date:    Wed, 25 Jul 2007 05:55:45 -0700
From:    Albert-jan Roskam <[hidden email]>
Subject: Re: include

Hi,

--What SPSS version are you using? INSERT has been
implemented relatively recently (spss v13+ (?)). Did
you try using INCLUDE already?
--Are you absolutely sure the sps file is where you
say it is?

Albert-Jan

--- Maddalena Agonigi <[hidden email]> wrote:

> I am new to SPSS programming and so any help will be
> deeply appreciated.
>
> I use INSERT in a Script
>
> INSERT FILE='C:\Test.sps'
>   SYNTAX=BATCH Error=Stop CD=YES.
>
> but  spss give me following message
>
> INSERT <quì>FILE='C:\Test.sps
> (0) Istruzione non valida.
>
> Thank you all
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

------------------------------

Date:    Wed, 25 Jul 2007 09:49:25 -0400
From:    Ken Wood <[hidden email]>
Subject: Re: data dictionary

Thank you for the many suggestions.  For those interested, the suggestions I received are below.






you can go to File Menu and choose the option display information
about the current working file or an external file. You will have an output
that you can save in .spo format or even copy to word, excel or any other
software.




Do Display Dictionary, then go to the Output, select the information
displayed, and either print from there, or export to either RTF or
Excel formats. Recent versions of SPSS provide information on
variable names and structure separately from Value labels, which is
a bit annoying.



try this command in syntax:

display dictionary.




DISPLAY VARIABLES / VARIABLES = myvar1 myvar2.
DISPLAY DICTIONARY / VARIABLES = myvar1 myvar2.

Will display only the variables of interest.




*OMS
*  /SELECT TABLES
*  /EXCEPTIF LABELS =['Notes']
*  /DESTINATION
*        Format = Text
*        OUTFILE = "dictionary.txt".
*DISPLAY DICTIONARY.
*OMSEND.



>
> Try File . . . Display Data File Information . . .
> Working File (if you have the data open). I believe
> the only way to limit the variables displayed in
> this command is to physically drop the variables.
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion
> [mailto:[hidden email]] On Behalf Of Ken
> Wood
> Sent: Tuesday, July 24, 2007 1:13 PM
> To: [hidden email]
> Subject: data dictionary
>
> How does one save (in order to print) the
> information about all the variables in a given
> dataset?  That is, the information (or selected
> information) that one sees in the Variable View?
>


Ken Wood, PhD
Research Scientist
KMRREC
West Orange, NJ
973-243-6871

------------------------------

Date:    Wed, 25 Jul 2007 21:58:49 +0800
From:    Hunna Watson <[hidden email]>
Subject: stepwise regression how to include all cases despite missing data

Hi all,=20
=20
I'm running a stepwise regression of organizational practices on =
construction projects that predict project cost growth. I have data for =
115 projects, yet some organizational practices were not applicable on =
some projects (in a random fashion). the missing data is obviously =
purposeful and not due to not filling in questionnaires etc. spss =
automatically excludes cases with any missing values, or wants to =
substitute a value, so I end up with a regression being carried out on =
10 projects, obviously not useful. Any suggestions for syntax to include =
all cases or suggestions to rectify this problem?
=20
Thanks in advance,=20
Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 09:12:37 -0500
From:    Melissa Ives <[hidden email]>
Subject: Re: Random date generator

During the assessment process, interviewers are given these instructions
for estimating a date when a client cannot remember specific days or
months. Perhaps you could create a similar algorithm?

Date Guidelines (d/e):  Use the following rules if the participant is
unsure of the exact date:
DAY: Use the 5th for the beginning of the month, 15th for the middle of
the month, and 25th for the end of the month.
MONTH: Use March for early in the year, July for middle of the year, and
October for later in the year, but try to make it so the number of weeks
is about right.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Syed Hashmi
Sent: Thursday, July 19, 2007 9:48 PM
To: [hidden email]
Subject: [SPSSX-L] Random date generator

Dear co-listers,

A dataset that I'm analyzing has a set of dates for events (start and
stop
dates) as well as how long those events occured for.  The data for each
date is in three variables (month, day, year). The years are pretty
complete if they are filled in but the month and day might are sometimes
listed as the exact month or date and other times they're listed as
beginning, middle or end of the year (for the month variable) or the
month (for the day variable).

Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
duration) from which I can deduce the start and stop dates (startdt,
stopdt).
Unfortunately,  I have the complete start and stop date for about half
the cases. The rest are missing either parts of one of the dates (eg.
day) or for both.  If I have one of the dates and a duration, I can
calculate the other date.

The reason for this post is that there is a small subset of the
population where I have the complete stop date but am missing the start
day (I have the year and month) and am also missing the duration.  I had
to come up with some way to impute a start date for these cases for
analysis (which will be done with and without these specific cases).  I
know that the event could not be more than a month long. Therefore, what
I was planning on doing was based on the information I have, calculate
the earliest possible start date (e_startdt) up to a month before the
stop date and then randomly pick a date between e_startdt and the stop
date.

Therefore, my query here was this: how can I code for this. I have an
idea of how to do it in SAS but since I'm working in SPSS that doesn't
help much.  I'm assuming that it will be something simple like:

     startdt = e_startdt + RANDOM_DAYS.

where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
startdt, "days").

So how would I go about doing this? I tried using the help files and all
but couldn't come up with something that worked. Is this the best way to
do this? Any other way that I can do this? Does it matter what kind of
seeding I use for the random number generator?

Thanks.

- Shahrukh


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 11:13:35 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 10:24:27 -0400
From:    Mark A Davenport MADAVENP <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing data

Hector,

When I worked at ACT, Inc we often treated the student's school identifier
this way, usually with great success.  Granted, we had many thousands of
cases to draw from.  Hunna only has 115?  She is going to run out of cases
pretty quickly, don't you think?


***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Hector Maletta <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
07/25/2007 10:15 AM
Please respond to
Hector Maletta <[hidden email]>


To
[hidden email]
cc

Subject
Re: stepwise regression how to include all cases despite missing    data






         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you
need
is some way of dealing with projects where some organizational practice
does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either
present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for
115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful
and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:36:49 -0300
From:    Hector Maletta <[hidden email]>
Subject: Re: stepwise regression how to include all cases despite missing
         data

Hunna:

Even with a scale, the "missing" responses can be reinterpreted as saying
"This practice was not effective in this case because -for one reason or
another- it was not used". This is not quite clean conceptually, but is your
only choice unless you put up with working with 10 cases.

The problem, apparently, is in the design of the questionnaire, asking for
the effectiveness of a practice that is not in universal use among the cases
under analysis. In any case, a practice cannot have any effectiveness if it
is not used, so I insist you can treat it as having zero effectiveness when
it was not used.

On the other hand, since your variables seem to be many, and your cases seem
to be few, perhaps you should consider a more artisanal approach for
identifying effective strategies instead of your alli-in-one regression.
With 87 predictors and 115 cases you don't have a chance even without a
single missing value.



Hector







  _____

From: Hunna Watson [mailto:[hidden email]]
Sent: 25 July 2007 11:36
To: Hector Maletta
Subject: FW: Re: stepwise regression how to include all cases despite
missing data



thanks for your reply, i've just come on board this project in the past two
weeks, the data has been collected already and, this is essentially what
happened though i'm simplifying it, respondents rated how effective the use
of the strategy was for preventing cost growth in the form of work that had
to be done again on the project, so I have data on a scale and no
possibility for coding absent or present :S



extra information....



yes I know all the horrible things about stepwise, but it is the only
suitable method I can think of to answer the research questions, I have just
come on board the project in the last two weeks. The research is very
exploratory and the topic hasn't been examined before. Data has been
collected on many different predictor variables (design-related sources,
subcontractor sources, site management sources, contract documentation, the
list goes on and on - up to a terrible 87 predictors). There are 115
projects, so each is a case if you like, and we want to first look at this
data set (no options there), but after that we can merge it with another
data set containing information on a further 160 projects. Some predictors
weren't relevant to projects. for instance, some didn't use incentives, but
we have ratings on scales of 1 to 5 (assessing raters perceptions of
contribution of use of that method to costs) and we are seeking to predict
costs from the predictor variables. IF a method wasn't applicable e.g., use
of a particular incentive plan, it has been left blank on questionnaires. No
logical ordering.

         Hunna,
         Your use of the stepwise method for regression instead of running
with all the variables at once is immaterial for your problem. What you need
is some way of dealing with projects where some organizational practice does
not apply.
         You do not give many details about the variables, but I imagine
that each organizational practice might be a dummy variable, either present
or absent. In such case, you may posit its effect on costs not as a result
of "choosing or not choosing it when it is adequate to choose it" but as a
result of its mere presence. A project may benefit from a practice if (a)
the practice is applicable and (b) it is actually used; otherwise the
project does not benefit from that practice. The absence of a practice may
thus be a result of deliberate choice or impossibility of application, but
in either case it would result in its effect not being observed. In other
words, you may (if your particular situation affords this interpretation)
treat the "missing" cases as negative instances, as zeroes in the dummies,
and proceed with the regression.
         If this road is not conceptually adequate, you're in trouble.
         Hector


         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hunna Watson
Sent: 25 July 2007 10:59
To: [hidden email]
Subject: stepwise regression how to include all cases despite missing data

         Hi all,

         I'm running a stepwise regression of organizational practices on
construction projects that predict project cost growth. I have data for 115
projects, yet some organizational practices were not applicable on some
projects (in a random fashion). the missing data is obviously purposeful and
not due to not filling in questionnaires etc. spss automatically excludes
cases with any missing values, or wants to substitute a value, so I end up
with a regression being carried out on 10 projects, obviously not useful.
Any suggestions for syntax to include all cases or suggestions to rectify
this problem?

         Thanks in advance,
         Hunna Watson

------------------------------

Date:    Wed, 25 Jul 2007 11:48:56 -0400
From:    "William B. Ware" <[hidden email]>
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 10:31:24 -0600
From:    ViAnn Beadle <[hidden email]>
Subject: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the
binhex function in GPL. It groups together nearby points and then you use
the summary.count function to count the number of hits within the bin to set
the size of the point displayed. This produces a plot sometimes referred to
as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee
data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel
points which look like dust on my monitor--so this gets around that. The
color.interior fills in the points (defaults to circles). I think the
default hollow points are ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))),
size(summary.count()), color.interior(color.blue))
END GPL.

The IGRAPH procedure also provides a jittering option which nudges the
points slightly apart by adding a small amount of random variation, but I
think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower
option to show multiple cases at the same point in a scatter plot.  Does
anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 18:39:32 +0200
From:    Georg Maubach <[hidden email]>
Subject: AW:      Re: Old Sunflower Option

Hi All,

We tried to run the syntax example on our "Employee data.sav" file. As we have a German version the variable names were translated to German. Does anybody know how we could obtain sample files in English?

Best regards

Georg Maubach
Research Manager


-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]] Im Auftrag von ViAnn Beadle
Gesendet: Mittwoch, 25. Juli 2007 18:31
An: [hidden email]
Betreff: Re: Old Sunflower Option

The nearest equivalent I can think of to the old sunflower option is the binhex function in GPL. It groups together nearby points and then you use the summary.count function to count the number of hits within the bin to set the size of the point displayed. This produces a plot sometimes referred to as bubble plot.

Here's some sample syntax which produces a "bubble" plot using the employee data.sav sample file. The SCALE command constrains the minimum point size to
5 pixels. IMHO, the default sizing creates really, really small 1 pixel points which look like dust on my monitor--so this gets around that. The color.interior fills in the points (defaults to circles). I think the default hollow points arrrre ugly--so this gets around that.


GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=prevexp jobtime MISSING=
  LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: prevexp=col(source(s), name("prevexp"))
 DATA: jobtime=col(source(s), name("jobtime"))
 GUIDE: axis(dim(1), label("Previous Experience (months)"))
 GUIDE: axis(dim(2), label("Months since Hire"))
 SCALE: linear(aesthetic(aesthetic.size), aestheticMinimum(size."5px"))
ELEMENT: point(position(bin.hex(prevexp*jobtime, dim(1,2))), size(summary.count()), color.interior(color.blue)) END GPL.

The IGRAPH procedure also provides a jittering option which nudges the points slightly apart by adding a small amount of random variation, but I think the GPL binhex approach works much better.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of William B. Ware
Sent: Wednesday, July 25, 2007 9:49 AM
To: [hidden email]
Subject: Old Sunflower Option

In the older versions of SPSS, there was something called the sunflower option to show multiple cases at the same point in a scatter plot.  Does anyone know how to do that in Version 14 and up?

Bill

__________________________________________________________________________
William B. Ware, Professor                         Educational Psychology,
CB# 3500                                       Measurement, and Evaluation
University of North Carolina                         PHONE  (919)-962-7848
Chapel Hill, NC      27599-3500                      FAX:   (919)-962-1533
Office:  118 Peabody Hall                            EMAIL: [hidden email]
Adjunct Professor                                    School of Social Work
__________________________________________________________________________

------------------------------

Date:    Wed, 25 Jul 2007 13:28:22 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Melissa,

I'm already doing something similar to what you said, using 1st, 10th
and 20th as the dates for start, middle and end of the month.  The Month
is a bit trickier since my exposures are events during pregnancy so I
have to be careful about just assigning a random month lest it falls
outside the pregnancy duration.

They question I had asked concerned dates which had month and year but
no day information - not even beginning, middle or end.  Therefore, I
couldn't even depend on the 1st-10th-20th coding.

My final solution for the problem where I had a start date and had a
stop month was to pick a random date between the start date (or the
first date of the stop month) and the last day of the stop month.  Gene
Maguin had emailed me earlier and suggested I use the UNIFORM function
to randomly select a date.  I don't think that message was posted on the
list-serv, so the body is copied below.  I've used the function since
and it works nicely.

Thanks again for your help though. I agree that there should be some
sort of algorithm in place at the interviewer level to minimize the
frequency of incomplete data.

- S. Hashmi

*copy of email from Gene*

> -----Original Message-----
> From: Gene Maguin [mailto:[hidden email]]
> Sent: Friday, July 20, 2007 8:05 AM
> To: Hashmi, Syed S
> Subject: RE: Random date generator
>
> Syed,
>
> I'd like to be helpful to you but I don't have time to make up a full
> solution. I think this would be a valid example of your question.
>
> Start date (mm/dd/yyyy): 5/x/2004
> Stop date (mm/dd/yyyy):  6/17/2004
> Possible duration range (6/17/2004)-(5/31/2004)=17 days to
> (6/17/2004)-(5/18/2004)=30 days (I assume a 30 day month)
>
> So x has to be between 18 and 31 inclusive.
>
> So I think the trick to the random draw is this command.
>
> Compute x=uniform(14).
> Compute x=trunc(x).
>
> Check this but I'm pretty sure that the range of x will be 0 to 13.
> Your actual date is then > Compute x=x+18.
>
> There's lots of big 'little bits' to tidy up but this will get you
want
> you > want when the tidying up has been done.
>
> Best wishes, Gene Maguin
>


> -----Original Message-----
> From: Melissa Ives [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 9:13 AM
> To: Hashmi, Syed S; [hidden email]
> Subject: RE: [SPSSX-L] Random date generator
>
> During the assessment process, interviewers are given these
instructions
> for estimating a date when a client cannot remember specific days or
> months. Perhaps you could create a similar algorithm?
>
> Date Guidelines (d/e):  Use the following rules if the participant is
> unsure of the exact date:
> DAY: Use the 5th for the beginning of the month, 15th for the middle
of
> the month, and 25th for the end of the month.
> MONTH: Use March for early in the year, July for middle of the year,
and
> October for later in the year, but try to make it so the number of
weeks
> is about right.
>
> Melissa
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Syed Hashmi
> Sent: Thursday, July 19, 2007 9:48 PM
> To: [hidden email]
> Subject: [SPSSX-L] Random date generator
>
> Dear co-listers,
>
> A dataset that I'm analyzing has a set of dates for events (start and
> stop
> dates) as well as how long those events occured for.  The data for
each
> date is in three variables (month, day, year). The years are pretty
> complete if they are filled in but the month and day might are
sometimes

> listed as the exact month or date and other times they're listed as
> beginning, middle or end of the year (for the month variable) or the
> month (for the day variable).
>
> Thus, I have 7 vars (startm, startd, starty, stopm, stopd, stopy,
> duration) from which I can deduce the start and stop dates (startdt,
> stopdt).
> Unfortunately,  I have the complete start and stop date for about half
> the cases. The rest are missing either parts of one of the dates (eg.
> day) or for both.  If I have one of the dates and a duration, I can
> calculate the other date.
>
> The reason for this post is that there is a small subset of the
> population where I have the complete stop date but am missing the
start
> day (I have the year and month) and am also missing the duration.  I
had
> to come up with some way to impute a start date for these cases for
> analysis (which will be done with and without these specific cases).
I
> know that the event could not be more than a month long. Therefore,
what

> I was planning on doing was based on the information I have, calculate
> the earliest possible start date (e_startdt) up to a month before the
> stop date and then randomly pick a date between e_startdt and the stop
> date.
>
> Therefore, my query here was this: how can I code for this. I have an
> idea of how to do it in SAS but since I'm working in SPSS that doesn't
> help much.  I'm assuming that it will be something simple like:
>
>      startdt = e_startdt + RANDOM_DAYS.
>
> where, RANDOM_DAYS is a random number chosen from DATEDIFF(stopdt,
> startdt, "days").
>
> So how would I go about doing this? I tried using the help files and
all
> but couldn't come up with something that worked. Is this the best way
to

> do this? Any other way that I can do this? Does it matter what kind of
> seeding I use for the random number generator?
>
> Thanks.
>
> - Shahrukh
>
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This transmittal and any attachments may contain PRIVILEGED AND
> CONFIDENTIAL information and is intended only for the use of the
> addressee. If you are not the designated recipient, or an employee
> or agent authorized to deliver such transmittals to the designated
> recipient, you are hereby notified that any dissemination,
> copying or publication of this transmittal is strictly prohibited. If
> you have received this transmittal in error, please notify us
> immediately by replying to the sender and delete this copy from your
> system. You may also call us at (309) 827-6026 for assistance.

------------------------------

Date:    Wed, 25 Jul 2007 15:05:07 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Random date generator

Somehow I missed or deleted the original posting in this thread.
Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,

>A dataset that I'm analyzing has a set of dates for events (start and
>stop dates) as well as how long those events occured for.  The data
>for each date is in three variables (month, day, year). The years are
>pretty complete if they are filled in but the month and day might are
>sometimes listed as the exact month or date and other times they're
>listed as beginning, middle or end of the year (for the month
>variable) or the month (for the day variable).
>
>I have [two dates as three variables each, plus a duration] duration).
>I have the complete start and stop date for about half the cases. The
>rest are missing either parts of one of the dates (eg. day) or for
>both.  If I have one of the dates and a duration, I can calculate the
>other date.

So far, so good, though be careful about how precise your 'durations'
are.

>There is a small subset of the population where I have the complete
>stop date but am missing the start day (I have the year and month) and
>am also missing the duration.  I had to come up with some way to
>impute a start date for these cases for analysis. (which will be done
>with and without these specific cases).  I know that the event could
>not be more than a month long. I was planning calculate the earliest
>possible start date (e_startdt) up to a month before the stop date and
>then randomly pick a date between e_startdt and the stop date.

OUCH! I would not do this. Period.

*MAYBE* the start dates and durations you get this way will be vaguely
representative of the population of events, though I doubt it. Are your
durations roughly uniformly distributed from 0 to 30 days? For goodness
sake, you ought to check that before proceeding.

But even if they're representative of the population, they have nothing
to do with the individual cases for which they're 'imputed'. No
analysis using those 'dates' will be the least trustworthy.

A far better approach is to use true missing-value interpolation on the
*durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
durations you'd have to impute. If it's near 50%, that won't be at all
reliable, either.

-Good luck,
  Richard

------------------------------

Date:    Wed, 25 Jul 2007 15:09:53 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

> -----Original Message-----
> From: Richard Ristow [mailto:[hidden email]]
> Sent: Wednesday, July 25, 2007 2:05 PM
>
> >There is a small subset of the population where I have the complete
> >stop date but am missing the start day (I have the year and month)
and
> >am also missing the duration.  I had to come up with some way to
> >impute a start date for these cases for analysis. (which will be done
> >with and without these specific cases).  I know that the event could
> >not be more than a month long. I was planning calculate the earliest
> >possible start date (e_startdt) up to a month before the stop date
and
> >then randomly pick a date between e_startdt and the stop date.
>
> OUCH! I would not do this. Period.
>
> *MAYBE* the start dates and durations you get this way will be vaguely
> representative of the population of events, though I doubt it. Are
your
> durations roughly uniformly distributed from 0 to 30 days? For
goodness
> sake, you ought to check that before proceeding.
>
> But even if they're representative of the population, they have
nothing
> to do with the individual cases for which they're 'imputed'. No
> analysis using those 'dates' will be the least trustworthy.
>
> A far better approach is to use true missing-value interpolation on
the
> *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many
> durations you'd have to impute. If it's near 50%, that won't be at all
> reliable, either.
>
> -Good luck,
>   Richard


Richard,

Thanks for your input.  I realize that I was stepping into extremely
treacherous territory when I decide to impute dates and select random
ones.  As for the durations being roughly uniformly distributed, that's
what it looks like from the data I do have.  Initially, I'd assumed that
durations would have a mean of about 7 days but somehow the data I do
have doesn't seem to show that.  It's more or less uniformly
distributed.  There were some durations that were >30 days but I doubt
if they're true.  Therefore, I decided to go ahead with the uniform
distribution (although, the whole imputation and random selection still
bothers me).

The reason that I'm trying to get an idea about the dates, especially
the event start dates, is due to the nature of the study question. I'm
looking at the occurrence of certain events during pregnancy.  However,
these events of interest have to occur within the first trimester, or if
I narrow it down further, the first two months of pregnancy.  Therefore,
I have to know if an event occurred within a certain period of time
after the last menstrual date as reported by the woman.  At the end of
the day, the variables for all the events get filtered down to a single
dichotomous variable - Y/N did the event occur during the period of
interest?

I will do the analysis with and without the cases where the dates have
been imputed from incomplete data.  I hadn't previously thought of using
true-missing value interpolation on the durations but I'll look into it.
I've never done that before so will have to read up a bit on it.  I
might have an issue with number of missings though, since more cases
have at least some part of the date then a duration value.

Thanks again for your advice. It's always nice to get a fresh look at an
issue.

- Shahurkh

------------------------------

Date:    Wed, 25 Jul 2007 16:24:09 -0400
From:    Gene Maguin <[hidden email]>
Subject: Re: Random date generator

Syed,

It sounds like you are going to use the imputed dates to decide if something
happended or not. The new variable, 'something happened or not' might be a
dependent variable or it might be an independent variable. There's a
literature on estimating relationships in the presence of missing data. To
correctly estimate relationships (or, at least, come very close), you should
use either multiple imputation or a maximum likelihood estimation method
that incorporates the EM algorithm. So far as I know, SPSS has neither. The
key person here is Donald Rubin. But, there are other, more recent articles.

Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 20:43:06 +0000
From:    Hamish Travers <[hidden email]>
Subject: Can someone please tell me how to unsubscribe from this forum,
         thanks in advance

_________________________________________________________________
Invite your mail contacts to join your friends list with Windows Live Space=
s. It's easy!
http://spaces.live.com/spacesapi.aspx?wx_action=3Dcreate&wx_url=3D/friends.=
aspx&mkt=3Den-us=

------------------------------

Date:    Wed, 25 Jul 2007 16:41:18 -0500
From:    "Hashmi, Syed S" <[hidden email]>
Subject: Re: Random date generator

Thanks Gene,

After the comments that you and Richard made I'm thinking real hard of
rethinking the whole thing.  Maximum likelihood estimation was something
that I had thought of initially but didn't follow up on.  I guess it's
time that I do.  Thanks again for your help.

- Shahrukh


> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
Of

> Gene Maguin
> Sent: Wednesday, July 25, 2007 3:24 PM
> To: [hidden email]
> Subject: Re: Random date generator
>
> Syed,
>
> It sounds like you are going to use the imputed dates to decide if
> something
> happended or not. The new variable, 'something happened or not' might
be a
> dependent variable or it might be an independent variable. There's a
> literature on estimating relationships in the presence of missing
data. To
> correctly estimate relationships (or, at least, come very close), you
> should
> use either multiple imputation or a maximum likelihood estimation
method
> that incorporates the EM algorithm. So far as I know, SPSS has
neither.
> The
> key person here is Donald Rubin. But, there are other, more recent
> articles.
>
> Gene Maguin

------------------------------

Date:    Wed, 25 Jul 2007 18:33:35 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Can someone please tell me how to unsubscribe

FAQ: How to unsubscribe, or leave list SPSSX-L:

Requests to unsubscribe that are posted to the list, will never be
acted on.

You must send the request to [hidden email].

 From the E-mail address from which you're subscribed to the list, send
a message to [hidden email] with the following words in the
body of the message:

SIGNOFF SPSSX-L

Don't put anything else (your name, etc.) in the body of the message.

It should work. If it doesn't, go to the following Web page:

http://www.listserv.uga.edu/cgi-bin/wa?SUBED1=spssx-l&A=1

and unsubscribe from there.

...........................
More information:

When you subscribed to the list, you received a welcome message (I'm
copying it below) with instructions (including asking you to save it).

 From the welcome message:

>Your  subscription to  the SPSSX-L  list (SPSSX(r)  Discussion)
>has  been accepted.
>
>Please save this message for future  reference, [...]
>
>You may leave the list at any time by sending a "SIGNOFF SPSSX-L"
>command to [hidden email].

There are many other commands that can be sent to the same address, to
manage your subscription. If you send mail to [hidden email]
with the text

INFO REFCARD

and no other text, you will be mailed a file describing those commands.

------------------------------

Date:    Wed, 25 Jul 2007 16:00:22 -0700
From:    Karen Powers <[hidden email]>
Subject: RENAME LOOP?

Hello SPSS list,
I have a dataset in which the variable names are var002, var003   ...
var3477.

I would like to RENAME each variable with the number listed in row 1 of
each respective column.
The first number in var002 is 6951030.  I would like var002 renamed
"rs6951030".
The command for doing this once that runs nicely is:

RENAME VARS var002 = "rs"+ "6951030".
EXE.

Now I would like to do this for all vars through var3477.
I have tried DO REPEAT and LOOP commands but they both say
 >Warning # 141.  Command name: RENAME VARS
 >DO REPEAT has no effect on this command.

Any ideas on how I can do this (3477 times)?

Thanks, Karen

------------------------------

Date:    Wed, 25 Jul 2007 19:27:19 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Aggregating with missing data

At 03:30 AM 7/14/2007, Marco wrote:

>When using the aggregate (mean) function in SPSS, cells that contain
>missing data become empty. Thus, a cell that should contain the mean
>of multiple cells (one of which is empty/missing), turns to zero
>because it contains one missing datum.

I'm not sure what's happening to you, but you shouldn't be seeing what
you say you're seeing.

 From your description, it sounds like you're doing one of two things:
a) Using the MEAN function with command AGGREGATE to average over a set
of variables
b) Using the MEAN function in the transformation language to average
over a set of variables.

BOTH of those, however, ignore missing values when averaging, and take
the mean of the non-missing values; they don't make a value 0 because
there's a missing value in the list. (That would be a very dangerous
thing to do anyway.) So I'm not sure what's happening.

Could you post the syntax, some test data, what output you get, and
tell us what output you want?

.....................................
Here are demonstrations of averaging across cases with AGGREGATE, and
averaging across variables. It's SPSS 15 draft output (WRR-not saved
separately).
.....................................
Using AGGREGATE to average over cases:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
[Aggregate]

Group Value

     1     1
     1     2
     1     3
     2     4
     2     5
     2     .
     2     7
     3     .
     3     9
     3    10


Number of cases read:  10    Number of cases listed:  10


AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:03       |
|-----------------------------|---------------------------|
Group Members     MEAN

     1       3     2.00
     2       4     5.33
     3       3     9.50

Number of cases read:  3    Number of cases listed:  3
.....................................
Using MEAN to average over variables:
List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:04       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4

     1      3       1       2       3       .
     2      4       4       5       .       7
     3      3       .       9      10       .


Number of cases read:  3    Number of cases listed:  3


NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |25-JUL-2007 19:16:26       |
|-----------------------------|---------------------------|
[Wide]

Group Members Value.1 Value.2 Value.3 Value.4   Mean

     1      3       1       2       3       .    2.00
     2      4       4       5       .       7    5.33
     3      3       .       9      10       .    9.50

Number of cases read:  3    Number of cases listed:  3

===================
APPENDIX:  All code
===================
I keyed the test data into the Data Editor; however, it can be
recovered from the LIST output fairly easily. Here's all the code:

DATASET ACTIVATE TestData.
DATASET COPY     Aggregate.
DATASET ACTIVATE Aggregate WINDOW=FRONT.

LIST.

AGGREGATE OUTFILE=*
   /BREAK=GROUP
   /Members 'Size of group' = NU
   /MEAN    'Mean of "value"' = MEAN(VALUE).

LIST.

DATASET ACTIVATE TestData.
DATASET COPY     Wide.
DATASET ACTIVATE Wide      WINDOW=FRONT.

SORT CASES BY Group .
CASESTOVARS
  /ID = Group
  /GROUPBY = VARIABLE
  /COUNT = Members "Size of group" .

LIST.

NUMERIC Mean (F6.2).
COMPUTE Mean = MEAN(Value.1 TO Value.4).

LIST.

------------------------------

Date:    Wed, 25 Jul 2007 20:24:54 -0400
From:    Richard Ristow <[hidden email]>
Subject: Re: Matching files on one of three possible ID's

At 09:29 AM 7/12/2007, Daniel Robertson wrote:

>I have two database extracts that I am trying to merge, one of which
>contains enrolled students and the other contains approximately the
>same group of students when they were applicants. In the Enrollment
>file students are uniquely identified by 'enroll_id'. In the Applicant
>file there is a primary ID, 'applicant_id1', but there may be up to
>two other IDs which were issued and updated provisionally as the
>student was going through the application process. The rub is that
>'enroll_id' may match any one of the applicant IDs, not necessarily
>the primary one.

Gene's given you a workable solution. It requires sorting the data
three times;  but with three keys, something like that is inevitable.

You *can* combine the three sorting operations into one step, by using
XSAVE to create three copies of each Applicant record, one each in
which 'enroll_id' is loaded from each of the three candidate key
variables in the Applicant file. Then sort the resulting file by that
'enroll_id', MATCH FILES with the Enrollment file, and discard any
Applicant records that don't match.

Now, that's the simplest possible case. You may need logic in case,
say, the same ID value occurs in more than one of the Applicant-record
fields. But it's another way to go.

Sorry; no code this time.

------------------------------

End of SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)
**************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Marta Garcia-Granero
In reply to this post by dnyeboa
Hi Debra

First of all, if you write a message by hitting "reply", eliminate all
the irrelevant text (as I have done), please, it can be a bit confusing.

Now, your questions:
> I'm new to SPSS and I wanted to know if there is a way that SPSS keeps track of
> what has been done to a variable. I know that you can paste your actions to
> Syntax and thus save a program. But does any one know how one might retrieve
> the programing of a variable is it was not initially pasted and saved?
>
That "programming" doesn't exist, you are mixing Excel with SPSS, I'm
afraid. If you didn't keep the syntax, it is gone forever, unless... see
below.
> Also, how do you find the "log" of your actions in SPSS?
>
Try to find a file named "spss.jnl". It can be at different places
(C:\Windows\Temp, the users' temporary file...), I recommend you to use
Windows'  Start -> Search -> Files or folders to locate it.

Last hope for the "programming of a variable" item you asked about
above: if all the modifications a variable underwent have been done in
the same computer, the syntax might be still be there in the "spss.jnl"
file. Good luck!

Regards,
Marta Garcia-Granero
Reply | Threaded
Open this post in threaded view
|

Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)

Muir Houston
In addition to Marta's comments, you can set the display to show commands in your output - that way if you have saved the output your commands will also be saved
go to 'Edit' click 'Options'  then when the dialog box opens, click 'Viewer' and then make sure the small check box is ticked which says 'Display commands in log'
 
 
Muir Houston
Research Fellow
CRLL
Institute of Education
University of Stirling
FK9 4LA
01786-46-7615

________________________________

From: SPSSX(r) Discussion on behalf of Marta Garcia-Granero
Sent: Fri 27/07/2007 09:47
To: [hidden email]
Subject: Re: SPSSX-L Digest - 24 Jul 2007 to 25 Jul 2007 (#2007-207)



Hi Debra

First of all, if you write a message by hitting "reply", eliminate all
the irrelevant text (as I have done), please, it can be a bit confusing.

Now, your questions:
> I'm new to SPSS and I wanted to know if there is a way that SPSS keeps track of
> what has been done to a variable. I know that you can paste your actions to
> Syntax and thus save a program. But does any one know how one might retrieve
> the programing of a variable is it was not initially pasted and saved?
>
That "programming" doesn't exist, you are mixing Excel with SPSS, I'm
afraid. If you didn't keep the syntax, it is gone forever, unless... see
below.
> Also, how do you find the "log" of your actions in SPSS?
>
Try to find a file named "spss.jnl". It can be at different places
(C:\Windows\Temp, the users' temporary file...), I recommend you to use
Windows'  Start -> Search -> Files or folders to locate it.

Last hope for the "programming of a variable" item you asked about
above: if all the modifications a variable underwent have been done in
the same computer, the syntax might be still be there in the "spss.jnl"
file. Good luck!

Regards,
Marta Garcia-Granero



--
The University of Stirling is a university established in Scotland by
charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
be contained in this message.  If you are not the addressee indicated
in this message (or responsible for delivery of the message to such
person), you may not disclose, copy or deliver this message to anyone
and any action taken or omitted to be taken in reliance on it, is
prohibited and may be unlawful.  In such case, you should destroy this
message and kindly notify the sender by reply email.  Please advise
immediately if you or your employer do not consent to Internet email
for messages of this kind.