Dear SPSSers.
I am wondering if my computer is having issues or whether the current processing speed is within the range of expected performance for version 23 and the following hardware. OS Windows Enterprise 7. Intel Core i7 CPU @ 3.40 GHz RAM 8.0 GB 64 Bit The PARTY dataset contains 6.8 million records and 84 variables (mostly STRING) and the COUNSEL file contains 1.3 million records. I have read over the numerous posts concerning processing speed, however, I was unable to find anything current on the topic. I have reduced the number of variables to see if that speeds up processing, but I did not see any improvements. Lastly, I did see significant improvements with using the EXECUTE command sparingly. The following syntax took about 25-30 minutes to process. The time does not include the initial query from a SQL server. Please let me know if you think this is par for the course, or whether there are other approaches to improve performance. Thank you Damir DATASET ACTIVATE PARTY WINDOW=FRONT . DATASET COPY COUNSEL . DATASET ACTIVATE COUNSEL WINDOW=FRONT . * MATCH FILES FILE = * / KEEP = CASEID PartyTypeDescription . * EXECUTE . COMPUTE SOQ = DATE.MDY (04,06,2018) . COMPUTE SOQT = TIME.HMS (09,27,00) . COMPUTE START_OQ = (SOQ + SOQT) . FORMATS START_OQ (DATETIME22) . *EXECUTE . MATCH FILES FILE=* / DROP = SOQ SOQT . *EXECUTE . RECODE PartyTypeDescription ("ATTORNEY"=1) ("ASSISTANT PUBLIC DEFENDER" = 2) ("PUBLIC DEFENDER"=2) ("ASSISTANT STATE ATTORNEY" =3) ("SPECIAL PROSECUTOR / ASA"=3) INTO COUNSEL_TYPE . *EXECUTE . SELECT IF NOT MISSING (COUNSEL_TYPE) . *EXECUTE . COMPUTE END_PROC = $TIME . FORMATS END_PROC (DATETIME22) . *EXECUTE . * Date and Time Wizard: TIME2_PROC. COMPUTE TIME2_PROC=DATEDIF(END_PROC, START_OQ, "minutes"). VARIABLE LABELS TIME2_PROC. VARIABLE LEVEL TIME2_PROC (SCALE). FORMATS TIME2_PROC (F5.0). VARIABLE WIDTH TIME2_PROC(5). EXECUTE. *DATASET ACTIVATE COUNSEL. FREQUENCIES VARIABLES=PartyTypeDescription COUNSEL_TYPE /ORDER=ANALYSIS. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Looks like the only thing of substance is the RECODE. Everything else looks
like benchmark code. 25 minutes sounds excessive for a mere 6.8 M records. DKUKEC wrote > Dear SPSSers. > > I am wondering if my computer is having issues or whether the current > processing speed is within the range of expected performance for version > 23 > and the following hardware. > > OS Windows Enterprise 7. > Intel Core i7 CPU @ 3.40 GHz > RAM 8.0 GB > 64 Bit > > The PARTY dataset contains 6.8 million records and 84 variables (mostly > STRING) and the COUNSEL file contains 1.3 million records. I have read > over > the numerous posts concerning processing speed, however, I was unable to > find anything current on the topic. I have reduced the number of > variables > to see if that speeds up processing, but I did not see any improvements. > Lastly, I did see significant improvements with using the EXECUTE command > sparingly. The following syntax took about 25-30 minutes to process. The > time does not include the initial query from a SQL server. > > Please let me know if you think this is par for the course, or whether > there > are other approaches to improve performance. > > Thank you > Damir > > > DATASET ACTIVATE PARTY WINDOW=FRONT . > DATASET COPY COUNSEL . > DATASET ACTIVATE COUNSEL WINDOW=FRONT . > > * MATCH FILES FILE = * / KEEP = CASEID PartyTypeDescription . > * EXECUTE . > > COMPUTE SOQ = DATE.MDY (04,06,2018) . > COMPUTE SOQT = TIME.HMS (09,27,00) . > COMPUTE START_OQ = (SOQ + SOQT) . > FORMATS START_OQ (DATETIME22) . > *EXECUTE . > > MATCH FILES FILE=* / DROP = SOQ SOQT . > *EXECUTE . > > RECODE PartyTypeDescription > ("ATTORNEY"=1) > ("ASSISTANT PUBLIC DEFENDER" = 2) > ("PUBLIC DEFENDER"=2) > ("ASSISTANT STATE ATTORNEY" =3) > ("SPECIAL PROSECUTOR / ASA"=3) INTO COUNSEL_TYPE . > *EXECUTE . > > > SELECT IF NOT MISSING (COUNSEL_TYPE) . > *EXECUTE . > > COMPUTE END_PROC = $TIME . > FORMATS END_PROC (DATETIME22) . > *EXECUTE . > > * Date and Time Wizard: TIME2_PROC. > COMPUTE TIME2_PROC=DATEDIF(END_PROC, START_OQ, "minutes"). > VARIABLE LABELS TIME2_PROC. > VARIABLE LEVEL TIME2_PROC (SCALE). > FORMATS TIME2_PROC (F5.0). > VARIABLE WIDTH TIME2_PROC(5). > EXECUTE. > > *DATASET ACTIVATE COUNSEL. > FREQUENCIES VARIABLES=PartyTypeDescription COUNSEL_TYPE > /ORDER=ANALYSIS. > > > > > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by DKUKEC
All of the EXECUTE statements can be removed, and copying the dataset is unnecessary. The original dataset won't be overwritten unless you explicitly save the altered version of the data. On Fri, Apr 6, 2018 at 9:27 AM, DKUKEC <[hidden email]> wrote: Dear SPSSers. |
As Rick, said, ditch the Executes and the Dataset Copy. Without them, I think the whole syntax would run in one Data pass. I suspect that the Dataset Copy may have eaten a lot of the time. I agree that the time seems quite excessive. Might be worth monitoring resource usage with the Task Manager to see what is using up resources. Also, if the string variables are very wide, that adds overhead. If they might have a lot of empty space, ALTER TYPE A=AMIN would reclaim that. On Sat, Apr 7, 2018 at 11:38 AM Rick Oliver <[hidden email]> wrote:
-- |
Free forum by Nabble | Edit this page |