SPSS running slow

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

David Marso
Administrator
Well,  I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data.  Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable.  TRAIN WRECK!

GauravSrivastava wrote
it's a bit different .. see below
CaseID  Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
1            1                          1                                  0
2            0                          1                                  1
3            1                          0
1
.              .                          .
    .
.             .                            .
    .
800         1                          1                                  .


Regards,
Gaurav


On Mon, Dec 17, 2012 at 10:56 PM, David Marso <[hidden email]> wrote:

> That is a completely unmanagable data arrangement.
> --
> Consider:
> Partially normalized
> CaseID  Brand  Attrib01....Attrib60.
> 1            1
> 1       ...
> 1          350
> 2            1
> 2          ...
> 2         350
> ----
>
> OR Fully normalized.
>
> CASEID Brand Attrib Value
> 1          1        1        00100101
> 1        .....
> 1          1       60       00100160
> ....
> 1        350     60       00135060
> ...
> 800      350   60       80035060
> ----------------
>
> GauravSrivastava wrote
> > Hi Gene,
> >
> > Yes, Variables are 27K  but caes are not too much. it's only approx. 800.
> > actually my data is in loop with brand (350 brands) vs brands attribute
> > (approx 60). Since it'a a tracker so there are many variable which we
> kept
> > to keep my data consistent. hope this give you a clear picture.
> >
> > Regards,
> > Gaurav
> >
> >
> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene <
>
> > emaguin@
>
> > > wrote:
> >
> >> Gaurav, ****
> >>
> >> ** **
> >>
> >> I’m curious about this problem you’re having with your dataset. Let’s
> >> talk
> >> about the dataset. Are you saying you have 27,000 (thousand) variables
> in
> >> the file? How many cases in the file? ****
> >>
> >> ** **
> >>
> >> Gene Maguin ****
> >>
> >> ** **
> >>
> >> *From:* SPSSX(r) Discussion [mailto:
>
> > SPSSX-L@.UGA
>
> > ] *On Behalf
> >> Of *I Am Gaurav
> >> *Sent:* Monday, December 17, 2012 6:57 AM
> >> *To:*
>
> > SPSSX-L@.UGA
>
> >> *Subject:* Re: SPSS running slow****
> >>
> >> ** **
> >>
> >> Thanks for all your response. ****
> >>
> >> I am not sure if there is any specific requirement with MATCH FILES
> >> syntax.
> >> ****
> >>
> >> I did it easily using ****
> >>
> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
> >>
> >>  /KEEP****
> >>
> >> respid ... till 27K variable.****
> >>
> >> exe.****
> >>
> >> ** **
> >>
> >> Regards,****
> >>
> >> Gaurav****
> >>
> >> ** **
> >>
> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso <
>
> > david.marso@
>
> > >
> >> wrote:****
> >>
> >> See item 02 of my horrible practices list.
> >>
> >> GauravSrivastava wrote
> >> > Hi David,
> >> >
> >> > Still I couldn't figure out the problem with my SPSS. I am trying to
> >> > reorder my SPSS file using below syntax:
> >> > MATCH FILES FILE=*/KEEP
> >> > respid ..... till all 27K variable.
> >> > exe.
> >> >
> >> > But My spss is running very slow and running from last 2 hour but no
> >> > outcome. Can you suggest any help?
> >> >
> >> > Regards,
> >> > Gaurav
> >>
> >>
> >>
> >>
> >>
> >> -----
> >> Please reply to the list and not to my personal email.
> >> Those desiring my consulting or training services please feel free to
> >> email me.
> >> --
> >> View this message in context:
> >>
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716977.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >>
>
> > LISTSERV@.UGA
>
> >  (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD****
> >>
> >> ** **
> >>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716989.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

Maguin, Eugene
Hey David, you've used the term 'normalize' a number of times in recent days and I'm unclear what exactly you mean by it. Would you educate me a bit? (Gently with the clue stick [probably, :-) is needed here]).
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, December 17, 2012 2:09 PM
To: [hidden email]
Subject: Re: SPSS running slow

Well,  I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data.  Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable.  TRAIN WRECK!


GauravSrivastava wrote

> it's a bit different .. see below
> CaseID  Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
> 1            1                          1
> 0
> 2            0                          1
> 1
> 3            1                          0
> 1
> .              .                          .
>     .
> .             .                            .
>     .
> 800         1                          1
> .
>
>
> Regards,
> Gaurav
>
>
> On Mon, Dec 17, 2012 at 10:56 PM, David Marso &lt;

> david.marso@

> &gt; wrote:
>
>> That is a completely unmanagable data arrangement.
>> --
>> Consider:
>> Partially normalized
>> CaseID  Brand  Attrib01....Attrib60.
>> 1            1
>> 1       ...
>> 1          350
>> 2            1
>> 2          ...
>> 2         350
>> ----
>>
>> OR Fully normalized.
>>
>> CASEID Brand Attrib Value
>> 1          1        1        00100101
>> 1        .....
>> 1          1       60       00100160
>> ....
>> 1        350     60       00135060
>> ...
>> 800      350   60       80035060
>> ----------------
>>
>> GauravSrivastava wrote
>> > Hi Gene,
>> >
>> > Yes, Variables are 27K  but caes are not too much. it's only approx.
>> 800.
>> > actually my data is in loop with brand (350 brands) vs brands
>> > attribute (approx 60). Since it'a a tracker so there are many
>> > variable which we
>> kept
>> > to keep my data consistent. hope this give you a clear picture.
>> >
>> > Regards,
>> > Gaurav
>> >
>> >
>> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene &lt;
>>
>> > emaguin@
>>
>> > &gt; wrote:
>> >
>> >> Gaurav, ****
>> >>
>> >> ** **
>> >>
>> >> I’m curious about this problem you’re having with your dataset.
>> >> Let’s talk about the dataset. Are you saying you have 27,000
>> >> (thousand) variables
>> in
>> >> the file? How many cases in the file? ****
>> >>
>> >> ** **
>> >>
>> >> Gene Maguin ****
>> >>
>> >> ** **
>> >>
>> >> *From:* SPSSX(r) Discussion [mailto:
>>
>> > SPSSX-L@.UGA
>>
>> > ] *On Behalf
>> >> Of *I Am Gaurav
>> >> *Sent:* Monday, December 17, 2012 6:57 AM
>> >> *To:*
>>
>> > SPSSX-L@.UGA
>>
>> >> *Subject:* Re: SPSS running slow****
>> >>
>> >> ** **
>> >>
>> >> Thanks for all your response. ****
>> >>
>> >> I am not sure if there is any specific requirement with MATCH
>> >> FILES syntax.
>> >> ****
>> >>
>> >> I did it easily using ****
>> >>
>> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
>> >>
>> >>  /KEEP****
>> >>
>> >> respid ... till 27K variable.****
>> >>
>> >> exe.****
>> >>
>> >> ** **
>> >>
>> >> Regards,****
>> >>
>> >> Gaurav****
>> >>
>> >> ** **
>> >>
>> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso &lt;
>>
>> > david.marso@
>>
>> > &gt;
>> >> wrote:****
>> >>
>> >> See item 02 of my horrible practices list.
>> >>
>> >> GauravSrivastava wrote
>> >> > Hi David,
>> >> >
>> >> > Still I couldn't figure out the problem with my SPSS. I am
>> >> > trying to reorder my SPSS file using below syntax:
>> >> > MATCH FILES FILE=*/KEEP
>> >> > respid ..... till all 27K variable.
>> >> > exe.
>> >> >
>> >> > But My spss is running very slow and running from last 2 hour
>> >> > but no outcome. Can you suggest any help?
>> >> >
>> >> > Regards,
>> >> > Gaurav
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----
>> >> Please reply to the list and not to my personal email.
>> >> Those desiring my consulting or training services please feel free
>> >> to email me.
>> >> --
>> >> View this message in context:
>> >>
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716977.html
>> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>> >>
>> >> =====================
>> >> To manage your subscription to SPSSX-L, send a message to
>> >>
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> >> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> >> list of commands to manage subscriptions, send the command INFO
>> >> REFCARD****
>> >>
>> >> ** **
>> >>
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716989.html Sent from the SPSSX Discussion mailing list archive
>> at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> list of commands to manage subscriptions, send the command INFO
>> REFCARD
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

Ruben Geert van den Berg
In reply to this post by John F Hall
Dear John,

The square brackets enclosing the second data source indicate that it's optional rather than mandatory.

It's explained right at the start of Universals in the FM -which indeed is sometimes more F than other times...

Best,

Ruben


Date: Mon, 17 Dec 2012 13:04:21 +0100
From: [hidden email]
Subject: Re: SPSS running slow
To: [hidden email]

FM not so F then?

 

 

John F Hall (Mr)

 

Email:      [hidden email]  

Website:    www.surveyresearch.weebly.com

 

 

 

 

 

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: 17 December 2012 12:22
To: [hidden email]
Subject: Re: SPSS running slow

 

Using MATCH FILES on a single file is perfectly fine.

It can be used to RENAME, DROP, Flag FIRST/LAST occurrences etc.

 

 

John F Hall wrote

> MATCH FILES needs at least one more file.  See page 1141 of the FM.

> John F Hall (Mr)

> Email:      &lt;mailto:

 

> johnfhall@

 

> &gt;

 

> johnfhall@

 

> Website:  &lt;http://surveyresearch.weebly.com/&gt;

> www.surveyresearch.weebly.com

> <SNIP FM details>

 

 

 

 

 

-----

Please reply to the list and not to my personal email.

Those desiring my consulting or training services please feel free to email me.

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716978.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

 

=====================

To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

Ruben Geert van den Berg
In reply to this post by David Marso
Dear all,

I think this post may be relevant as well: http://listserv.uga.edu/cgi-bin/wa?A2=ind0506&L=spssx-l&P=45672. Especially the last bit is interesting, as it suggests that vast numbers of variables are more problematic than vast numbers of cases.

So indeed, "long" may be a better idea than "wide" as David already indicated.

Best,

Ruben

P.s. I think "normalization" here refers to Database Normalization, see: http://en.wikipedia.org/wiki/Database_normalization.


> Date: Mon, 17 Dec 2012 11:09:17 -0800

> From: [hidden email]
> Subject: Re: SPSS running slow
> To: [hidden email]
>
> Well, I had imagined so much and am suggesting you back out of that slow
> motion train wreck and think about normalizing the data. Any analysis you
> attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually
> inoperable. TRAIN WRECK!
>
>
> GauravSrivastava wrote
> > it's a bit different .. see below
> > CaseID Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
> > 1 1 1
> > 0
> > 2 0 1
> > 1
> > 3 1 0
> > 1
> > . . .
> > .
> > . . .
> > .
> > 800 1 1
> > .
> >
> >
> > Regards,
> > Gaurav
> >
> >
> > On Mon, Dec 17, 2012 at 10:56 PM, David Marso &lt;
>
> > david.marso@
>
> > &gt; wrote:
> >
> >> That is a completely unmanagable data arrangement.
> >> --
> >> Consider:
> >> Partially normalized
> >> CaseID Brand Attrib01....Attrib60.
> >> 1 1
> >> 1 ...
> >> 1 350
> >> 2 1
> >> 2 ...
> >> 2 350
> >> ----
> >>
> >> OR Fully normalized.
> >>
> >> CASEID Brand Attrib Value
> >> 1 1 1 00100101
> >> 1 .....
> >> 1 1 60 00100160
> >> ....
> >> 1 350 60 00135060
> >> ...
> >> 800 350 60 80035060
> >> ----------------
> >>
> >> GauravSrivastava wrote
> >> > Hi Gene,
> >> >
> >> > Yes, Variables are 27K but caes are not too much. it's only approx.
> >> 800.
> >> > actually my data is in loop with brand (350 brands) vs brands attribute
> >> > (approx 60). Since it'a a tracker so there are many variable which we
> >> kept
> >> > to keep my data consistent. hope this give you a clear picture.
> >> >
> >> > Regards,
> >> > Gaurav
> >> >
> >> >
> >> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene &lt;
> >>
> >> > emaguin@
> >>
> >> > &gt; wrote:
> >> >
> >> >> Gaurav, ****
> >> >>
> >> >> ** **
> >> >>
> >> >> I’m curious about this problem you’re having with your dataset. Let’s
> >> >> talk
> >> >> about the dataset. Are you saying you have 27,000 (thousand) variables
> >> in
> >> >> the file? How many cases in the file? ****
> >> >>
> >> >> ** **
> >> >>
> >> >> Gene Maguin ****
> >> >>
> >> >> ** **
> >> >>
> >> >> *From:* SPSSX(r) Discussion [mailto:
> >>
> >> > SPSSX-L@.UGA
> >>
> >> > ] *On Behalf
> >> >> Of *I Am Gaurav
> >> >> *Sent:* Monday, December 17, 2012 6:57 AM
> >> >> *To:*
> >>
> >> > SPSSX-L@.UGA
> >>
> >> >> *Subject:* Re: SPSS running slow****
> >> >>
> >> >> ** **
> >> >>
> >> >> Thanks for all your response. ****
> >> >>
> >> >> I am not sure if there is any specific requirement with MATCH FILES
> >> >> syntax.
> >> >> ****
> >> >>
> >> >> I did it easily using ****
> >> >>
> >> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
> >> >>
> >> >> /KEEP****
> >> >>
> >> >> respid ... till 27K variable.****
> >> >>
> >> >> exe.****
> >> >>
> >> >> ** **
> >> >>
> >> >> Regards,****
> >> >>
> >> >> Gaurav****
> >> >>
> >> >> ** **
> >> >>
> >> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso &lt;
> >>
> >> > david.marso@
> >>
> >> > &gt;
> >> >> wrote:****
> >> >>
> >> >> See item 02 of my horrible practices list.
> >> >>
> >> >> GauravSrivastava wrote
> >> >> > Hi David,
> >> >> >
> >> >> > Still I couldn't figure out the problem with my SPSS. I am trying to
> >> >> > reorder my SPSS file using below syntax:
> >> >> > MATCH FILES FILE=*/KEEP
> >> >> > respid ..... till all 27K variable.
> >> >> > exe.
> >> >> >
> >> >> > But My spss is running very slow and running from last 2 hour but no
> >> >> > outcome. Can you suggest any help?
> >> >> >
> >> >> > Regards,
> >> >> > Gaurav
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -----
> >> >> Please reply to the list and not to my personal email.
> >> >> Those desiring my consulting or training services please feel free to
> >> >> email me.
> >> >> --
> >> >> View this message in context:
> >> >>
> >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716977.html
> >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >> >>
> >> >> =====================
> >> >> To manage your subscription to SPSSX-L, send a message to
> >> >>
> >>
> >> > LISTSERV@.UGA
> >>
> >> > (not to SPSSX-L), with no body text except the
> >> >> command. To leave the list, send the command
> >> >> SIGNOFF SPSSX-L
> >> >> For a list of commands to manage subscriptions, send the command
> >> >> INFO REFCARD****
> >> >>
> >> >> ** **
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> -----
> >> Please reply to the list and not to my personal email.
> >> Those desiring my consulting or training services please feel free to
> >> email me.
> >> --
> >> View this message in context:
> >> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716989.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >>
>
> > LISTSERV@.UGA
>
> > (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to email me.
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

Jon K Peck
Bear in mind that these posts are all quite old, so the places where you might run into trouble have moved farther out, especially if you are using a 64-bit version of Statistics, but the general message still applies - there is a lot of overhead in lugging around huge numbers of variables, not to mention the maintenance and management of extremely wide datasets.  Narrow is good.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Ruben van den Berg <[hidden email]>
To:        [hidden email],
Date:        12/17/2012 12:51 PM
Subject:        Re: [SPSSX-L] SPSS running slow
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear all,

I think this post may be relevant as well: http://listserv.uga.edu/cgi-bin/wa?A2=ind0506&L=spssx-l&P=45672. Especially the last bit is interesting, as it suggests that vast numbers of variables are more problematic than vast numbers of cases.

So indeed, "long" may be a better idea than "wide" as David already indicated.

Best,

Ruben

P.s. I think "normalization" here refers to Database Normalization, see: http://en.wikipedia.org/wiki/Database_normalization.


> Date: Mon, 17 Dec 2012 11:09:17 -0800
> From: [hidden email]
> Subject: Re: SPSS running slow
> To: [hidden email]
>
> Well, I had imagined so much and am suggesting you back out of that slow
> motion train wreck and think about normalizing the data. Any analysis you
> attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually
> inoperable. TRAIN WRECK!
>
>
> GauravSrivastava wrote
> > it's a bit different .. see below
> > CaseID Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
> > 1 1 1
> > 0
> > 2 0 1
> > 1
> > 3 1 0
> > 1
> > . . .
> > .
> > . . .
> > .
> > 800 1 1
> > .
> >
> >
> > Regards,
> > Gaurav
> >
> >
> > On Mon, Dec 17, 2012 at 10:56 PM, David Marso &lt;
>
> > david.marso@
>
> > &gt; wrote:
> >
> >> That is a completely unmanagable data arrangement.
> >> --
> >> Consider:
> >> Partially normalized
> >> CaseID Brand Attrib01....Attrib60.
> >> 1 1
> >> 1 ...
> >> 1 350
> >> 2 1
> >> 2 ...
> >> 2 350
> >> ----
> >>
> >> OR Fully normalized.
> >>
> >> CASEID Brand Attrib Value
> >> 1 1 1 00100101
> >> 1 .....
> >> 1 1 60 00100160
> >> ....
> >> 1 350 60 00135060
> >> ...
> >> 800 350 60 80035060
> >> ----------------
> >>
> >> GauravSrivastava wrote
> >> > Hi Gene,
> >> >
> >> > Yes, Variables are 27K but caes are not too much. it's only approx.
> >> 800.
> >> > actually my data is in loop with brand (350 brands) vs brands attribute
> >> > (approx 60). Since it'a a tracker so there are many variable which we
> >> kept
> >> > to keep my data consistent. hope this give you a clear picture.
> >> >
> >> > Regards,
> >> > Gaurav
> >> >
> >> >
> >> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene &lt;
> >>
> >> > emaguin@
> >>
> >> > &gt; wrote:
> >> >
> >> >> Gaurav, ****
> >> >>
> >> >> ** **
> >> >>
> >> >> I’m curious about this problem you’re having with your dataset. Let’s
> >> >> talk
> >> >> about the dataset. Are you saying you have 27,000 (thousand) variables
> >> in
> >> >> the file? How many cases in the file? ****
> >> >>
> >> >> ** **
> >> >>
> >> >> Gene Maguin ****
> >> >>
> >> >> ** **
> >> >>
> >> >> *From:* SPSSX(r) Discussion [mailto:
> >>
> >> > SPSSX-L@.UGA
> >>
> >> > ] *On Behalf
> >> >> Of *I Am Gaurav
> >> >> *Sent:* Monday, December 17, 2012 6:57 AM
> >> >> *To:*
> >>
> >> > SPSSX-L@.UGA
> >>
> >> >> *Subject:* Re: SPSS running slow****
> >> >>
> >> >> ** **
> >> >>
> >> >> Thanks for all your response. ****
> >> >>
> >> >> I am not sure if there is any specific requirement with MATCH FILES
> >> >> syntax.
> >> >> ****
> >> >>
> >> >> I did it easily using ****
> >> >>
> >> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
> >> >>
> >> >> /KEEP****
> >> >>
> >> >> respid ... till 27K variable.****
> >> >>
> >> >> exe.****
> >> >>
> >> >> ** **
> >> >>
> >> >> Regards,****
> >> >>
> >> >> Gaurav****
> >> >>
> >> >> ** **
> >> >>
> >> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso &lt;
> >>
> >> > david.marso@
> >>
> >> > &gt;
> >> >> wrote:****
> >> >>
> >> >> See item 02 of my horrible practices list.
> >> >>
> >> >> GauravSrivastava wrote
> >> >> > Hi David,
> >> >> >
> >> >> > Still I couldn't figure out the problem with my SPSS. I am trying to
> >> >> > reorder my SPSS file using below syntax:
> >> >> > MATCH FILES FILE=*/KEEP
> >> >> > respid ..... till all 27K variable.
> >> >> > exe.
> >> >> >
> >> >> > But My spss is running very slow and running from last 2 hour but no
> >> >> > outcome. Can you suggest any help?
> >> >> >
> >> >> > Regards,
> >> >> > Gaurav
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -----
> >> >> Please reply to the list and not to my personal email.
> >> >> Those desiring my consulting or training services please feel free to
> >> >> email me.
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716977.html
> >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >> >>
> >> >> =====================
> >> >> To manage your subscription to SPSSX-L, send a message to
> >> >>
> >>
> >> > LISTSERV@.UGA
> >>
> >> > (not to SPSSX-L), with no body text except the
> >> >> command. To leave the list, send the command
> >> >> SIGNOFF SPSSX-L
> >> >> For a list of commands to manage subscriptions, send the command
> >> >> INFO REFCARD****
> >> >>
> >> >> ** **
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> -----
> >> Please reply to the list and not to my personal email.
> >> Those desiring my consulting or training services please feel free to
> >> email me.
> >> --
> >> View this message in context:
> >>
http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716989.html
> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
> >>
> >> =====================
> >> To manage your subscription to SPSSX-L, send a message to
> >>
>
> > LISTSERV@.UGA
>
> > (not to SPSSX-L), with no body text except the
> >> command. To leave the list, send the command
> >> SIGNOFF SPSSX-L
> >> For a list of commands to manage subscriptions, send the command
> >> INFO REFCARD
> >>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to email me.
> --
> View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

David Marso
Administrator
In reply to this post by Maguin, Eugene
In this case I mean merely a long rather than a wide data representation.
However onsider a case where there was also information about the question (say sub form of a questionaire).  In order to encode that in the current format would require 350 x 60 new completely redundant columns.
In the long format one would create a table with 350 rows with the question number as a key.
Use a MATCH with a TABLE to associate that additional info.  Say there was also subject info (that would be another table with 800 rows .
If one simply stores the subID question_number and attributes with the value 1 it is a simple matter to use CASESTOVARS or VECTOR -> AGGREGATE to build out the wide version if and when one might (can't imagine why) require it.  Ultimately it depends upon the end use of the data but it is rarely (if ever) a great idea to build out 20K + columns.

Maguin, Eugene wrote
Hey David, you've used the term 'normalize' a number of times in recent days and I'm unclear what exactly you mean by it. Would you educate me a bit? (Gently with the clue stick [probably, :-) is needed here]).
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, December 17, 2012 2:09 PM
To: [hidden email]
Subject: Re: SPSS running slow

Well,  I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data.  Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable.  TRAIN WRECK!


GauravSrivastava wrote
> it's a bit different .. see below
> CaseID  Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
> 1            1                          1
> 0
> 2            0                          1
> 1
> 3            1                          0
> 1
> .              .                          .
>     .
> .             .                            .
>     .
> 800         1                          1
> .
>
>
> Regards,
> Gaurav
>
>
> On Mon, Dec 17, 2012 at 10:56 PM, David Marso <

> david.marso@

> > wrote:
>
>> That is a completely unmanagable data arrangement.
>> --
>> Consider:
>> Partially normalized
>> CaseID  Brand  Attrib01....Attrib60.
>> 1            1
>> 1       ...
>> 1          350
>> 2            1
>> 2          ...
>> 2         350
>> ----
>>
>> OR Fully normalized.
>>
>> CASEID Brand Attrib Value
>> 1          1        1        00100101
>> 1        .....
>> 1          1       60       00100160
>> ....
>> 1        350     60       00135060
>> ...
>> 800      350   60       80035060
>> ----------------
>>
>> GauravSrivastava wrote
>> > Hi Gene,
>> >
>> > Yes, Variables are 27K  but caes are not too much. it's only approx.
>> 800.
>> > actually my data is in loop with brand (350 brands) vs brands
>> > attribute (approx 60). Since it'a a tracker so there are many
>> > variable which we
>> kept
>> > to keep my data consistent. hope this give you a clear picture.
>> >
>> > Regards,
>> > Gaurav
>> >
>> >
>> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene <
>>
>> > emaguin@
>>
>> > > wrote:
>> >
>> >> Gaurav, ****
>> >>
>> >> ** **
>> >>
>> >> I’m curious about this problem you’re having with your dataset.
>> >> Let’s talk about the dataset. Are you saying you have 27,000
>> >> (thousand) variables
>> in
>> >> the file? How many cases in the file? ****
>> >>
>> >> ** **
>> >>
>> >> Gene Maguin ****
>> >>
>> >> ** **
>> >>
>> >> *From:* SPSSX(r) Discussion [mailto:
>>
>> > SPSSX-L@.UGA
>>
>> > ] *On Behalf
>> >> Of *I Am Gaurav
>> >> *Sent:* Monday, December 17, 2012 6:57 AM
>> >> *To:*
>>
>> > SPSSX-L@.UGA
>>
>> >> *Subject:* Re: SPSS running slow****
>> >>
>> >> ** **
>> >>
>> >> Thanks for all your response. ****
>> >>
>> >> I am not sure if there is any specific requirement with MATCH
>> >> FILES syntax.
>> >> ****
>> >>
>> >> I did it easily using ****
>> >>
>> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
>> >>
>> >>  /KEEP****
>> >>
>> >> respid ... till 27K variable.****
>> >>
>> >> exe.****
>> >>
>> >> ** **
>> >>
>> >> Regards,****
>> >>
>> >> Gaurav****
>> >>
>> >> ** **
>> >>
>> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso <
>>
>> > david.marso@
>>
>> > >
>> >> wrote:****
>> >>
>> >> See item 02 of my horrible practices list.
>> >>
>> >> GauravSrivastava wrote
>> >> > Hi David,
>> >> >
>> >> > Still I couldn't figure out the problem with my SPSS. I am
>> >> > trying to reorder my SPSS file using below syntax:
>> >> > MATCH FILES FILE=*/KEEP
>> >> > respid ..... till all 27K variable.
>> >> > exe.
>> >> >
>> >> > But My spss is running very slow and running from last 2 hour
>> >> > but no outcome. Can you suggest any help?
>> >> >
>> >> > Regards,
>> >> > Gaurav
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----
>> >> Please reply to the list and not to my personal email.
>> >> Those desiring my consulting or training services please feel free
>> >> to email me.
>> >> --
>> >> View this message in context:
>> >>
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716977.html
>> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>> >>
>> >> =====================
>> >> To manage your subscription to SPSSX-L, send a message to
>> >>
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> >> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> >> list of commands to manage subscriptions, send the command INFO
>> >> REFCARD****
>> >>
>> >> ** **
>> >>
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716989.html Sent from the SPSSX Discussion mailing list archive
>> at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> list of commands to manage subscriptions, send the command INFO
>> REFCARD
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Maguin, Eugene wrote
Hey David, you've used the term 'normalize' a number of times in recent days and I'm unclear what exactly you mean by it. Would you educate me a bit? (Gently with the clue stick [probably, :-) is needed here]).
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso
Sent: Monday, December 17, 2012 2:09 PM
To: [hidden email]
Subject: Re: SPSS running slow

Well,  I had imagined so much and am suggesting you back out of that slow motion train wreck and think about normalizing the data.  Any analysis you attempt with 27,000 columns will be SLOW! Any dialog boxes will be virtually inoperable.  TRAIN WRECK!


GauravSrivastava wrote
> it's a bit different .. see below
> CaseID  Brand1_Attrib01 Brand1_Attrib02....Brand350_Attrib60
> 1            1                          1
> 0
> 2            0                          1
> 1
> 3            1                          0
> 1
> .              .                          .
>     .
> .             .                            .
>     .
> 800         1                          1
> .
>
>
> Regards,
> Gaurav
>
>
> On Mon, Dec 17, 2012 at 10:56 PM, David Marso <

> david.marso@

> > wrote:
>
>> That is a completely unmanagable data arrangement.
>> --
>> Consider:
>> Partially normalized
>> CaseID  Brand  Attrib01....Attrib60.
>> 1            1
>> 1       ...
>> 1          350
>> 2            1
>> 2          ...
>> 2         350
>> ----
>>
>> OR Fully normalized.
>>
>> CASEID Brand Attrib Value
>> 1          1        1        00100101
>> 1        .....
>> 1          1       60       00100160
>> ....
>> 1        350     60       00135060
>> ...
>> 800      350   60       80035060
>> ----------------
>>
>> GauravSrivastava wrote
>> > Hi Gene,
>> >
>> > Yes, Variables are 27K  but caes are not too much. it's only approx.
>> 800.
>> > actually my data is in loop with brand (350 brands) vs brands
>> > attribute (approx 60). Since it'a a tracker so there are many
>> > variable which we
>> kept
>> > to keep my data consistent. hope this give you a clear picture.
>> >
>> > Regards,
>> > Gaurav
>> >
>> >
>> > On Mon, Dec 17, 2012 at 7:22 PM, Maguin, Eugene <
>>
>> > emaguin@
>>
>> > > wrote:
>> >
>> >> Gaurav, ****
>> >>
>> >> ** **
>> >>
>> >> I’m curious about this problem you’re having with your dataset.
>> >> Let’s talk about the dataset. Are you saying you have 27,000
>> >> (thousand) variables
>> in
>> >> the file? How many cases in the file? ****
>> >>
>> >> ** **
>> >>
>> >> Gene Maguin ****
>> >>
>> >> ** **
>> >>
>> >> *From:* SPSSX(r) Discussion [mailto:
>>
>> > SPSSX-L@.UGA
>>
>> > ] *On Behalf
>> >> Of *I Am Gaurav
>> >> *Sent:* Monday, December 17, 2012 6:57 AM
>> >> *To:*
>>
>> > SPSSX-L@.UGA
>>
>> >> *Subject:* Re: SPSS running slow****
>> >>
>> >> ** **
>> >>
>> >> Thanks for all your response. ****
>> >>
>> >> I am not sure if there is any specific requirement with MATCH
>> >> FILES syntax.
>> >> ****
>> >>
>> >> I did it easily using ****
>> >>
>> >> SAVE OUTFILE = "C:\Users\GGAURAVS\Downloads\abc.sav"****
>> >>
>> >>  /KEEP****
>> >>
>> >> respid ... till 27K variable.****
>> >>
>> >> exe.****
>> >>
>> >> ** **
>> >>
>> >> Regards,****
>> >>
>> >> Gaurav****
>> >>
>> >> ** **
>> >>
>> >> On Mon, Dec 17, 2012 at 4:48 PM, David Marso <
>>
>> > david.marso@
>>
>> > >
>> >> wrote:****
>> >>
>> >> See item 02 of my horrible practices list.
>> >>
>> >> GauravSrivastava wrote
>> >> > Hi David,
>> >> >
>> >> > Still I couldn't figure out the problem with my SPSS. I am
>> >> > trying to reorder my SPSS file using below syntax:
>> >> > MATCH FILES FILE=*/KEEP
>> >> > respid ..... till all 27K variable.
>> >> > exe.
>> >> >
>> >> > But My spss is running very slow and running from last 2 hour
>> >> > but no outcome. Can you suggest any help?
>> >> >
>> >> > Regards,
>> >> > Gaurav
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----
>> >> Please reply to the list and not to my personal email.
>> >> Those desiring my consulting or training services please feel free
>> >> to email me.
>> >> --
>> >> View this message in context:
>> >>
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716977.html
>> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>> >>
>> >> =====================
>> >> To manage your subscription to SPSSX-L, send a message to
>> >>
>>
>> > LISTSERV@.UGA
>>
>> >  (not to SPSSX-L), with no body text except the
>> >> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> >> list of commands to manage subscriptions, send the command INFO
>> >> REFCARD****
>> >>
>> >> ** **
>> >>
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp571
>> 6941p5716989.html Sent from the SPSSX Discussion mailing list archive
>> at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>> list of commands to manage subscriptions, send the command INFO
>> REFCARD
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/SPSS-running-slow-tp5716941p5716991.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: SPSS running slow

Eero Olli
In reply to this post by GauravSrivastava
LISTSERV at the University of Georgia

Hi Gaurav,

 

You can learn more about the principles behind normalization, for example, here:

http://en.wikipedia.org/wiki/Database_normalization

SPSS is not a relational database. However, you should try to normalize the dataset you want to analyze (in database language a dataset is called for a table).

The suggestions given here by other list members are sound: Make one long file (or several). I would also put effort in making good value labels.

 

You have given very little information about what you are trying to achieve with your analysis. It is possible that it would suit better your needs to create multiple data files.  I will give one example of how the analytical interest can guide the process of finding a useful form for the data file:

I was once trying to find out which assumptions behind a theory were the most useful. Basically, I wanted to compare 800 cross tables with each other.  I created a process where (one set of assumptions => coding of data => a unique data file with 18 variables of interest => analysis of data => save selected results in a common data file) was looped 40 times, once for each possible combination of assumptions.  In other words, I had 40 temporary data files each consisting of 5000 cases from a survey, while only saving 20 new lines in the common data file. Each case in the common data file was a representation of a cross table: the names of the variables used, the number of categories in the variables used; the number of categories with less than 20 respondents; a measure of the strength of the relationship between two variables; and a description of the assumptions used. Afterwards, I analyzed the common data file for patterns that allowed me to make informed decisions about the usefulness of the assumptions.  Later the same procedure was replicated using a different survey.  (After I was happy with the assumptions and corresponding coding of variables, I created proper data file with all 250 variables for more typical analyses)

 

An other way to do the same, would have  been to have 40 variations of the 18 variables I were using for this analysis, which would have lead to a data file with 720 variables. However, the coding of data was complicated and isolated into separate syntax files, in order to, ensure that every stage of the analysis uses the same coding, which was hard to achieve with a wide data file. In addition, analyzing a wide data file would have been hard, even if it has a conceptual simplicity (the cases are the same 5000, even if missing is different and each variation of a variable is present in the data file), because one must be consistent with the assumptions in each analysis. Thus, out of the 720 variables I would need to find those 18 that can be used together. I am a strong believer in human error: If it is possible to make mistakes, one will. Therefore, all in all, it was much better to create 40 temporary data files, than one 40 times as wide data file.

 

Best,

Eero Olli

 

Eero Olli                                                                          phone +47 23 15 73 44

Senior Adviser at Equality- and Anti-Discrimination Ombuds office

Mail: Post office box 8048 Dep, 0031 Oslo

Visits: Mariboesgate 13, Oslo

www.ldo.no

 

 

 

 

12