Dear Good Listers, I am looking for code to calculate a value for variables Solution1, Solution2 and Solution3. An input program has been included. The way in which each solution variable is computed is
described below as are two rules which need to be followed. I would be grateful for any assistance which can be provided. This is another tricky problem that is far beyond my humble skills.
Kind regards, Jonahtan **************************************************************************************************************************. DATA LIST / ID 1 v1 TO v34 2-35 solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 END DATA. **************************************************************************************************************************. * There are three solution variables and each records a specific response pattern which may occur more than once with mutiple reponse * patterns possible for the same ID.
* * COMPUTE SOLUTION 1 - If there are more than 7 consecutive integers of value '1' repeated in succession across v1-v34 solution1 = 1 * ELSE solution1 = 0. * COMPUTE SOLUTION 2 - If there are more than 5 consecutive integers of value '4' repeated in succession across v1-v34 solution2 = 1 * ELSE solution2 = 0.
* COMPUTE SOLUTION 3 - If there are more than 2 consecutive integers of value '3' repeated in succession across v1-v34 solution3 = 1 * ELSE solution3 = 0. * * RULE A: IF any of these consecutive sets of strings (e.g., 444444) occurs more than once for the same ID then the count of these response * sets should also be recorded as the solution variable value which is, in essence, a count of the occurances of each response pattern. For
* example, for ID=8 there are 3 sets of consecutive integers of value '3' in seperate successions across v1-v34 and this is why solution = 3. * * RULE B: Also note that it is possible for there to be more than one type of response pattern for each ID, for example ID=1 has two
* response patterns.
**************************************************************************************************************************. |
Banned User
|
I will be out of the office until Wednesday, March 20, 2013, with limited access to email. However, please know that your message is very important to me and I will respond when I return.
Thank you.
Sincerely, Cheryl _____________________________________________________ Cheryl A. Boglarsky, Ph.D. Human Synergistics, Inc. 39819 Plymouth Road Plymouth, MI 48170 734.459.1030
This message includes legally privileged and confidential information that is intended only for the use of the recipient named above. All readers of this message, other than the intended recipient, are hereby notified that any dissemination, modification, distribution or reproduction of this e-mail is strictly forbidden. |
Administrator
|
In reply to this post by Snuffy Dog
See VECTOR.
See LOOP. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
OTOH: The following will enumerate all sequences (since we've been down the VECTOR/LOOP path before).
I will leave it to you to implement the rather trivial logic to sniff out the logical conditions. -- VARSTOCASES /MAKE long FROM v1 TO v34. SPLIT FILE BY ID long. COMPUTE @=1. CREATE Counter=CSUM(@). DO IF ID=LAG(ID) AND Counter LE LAG(COUNTER). COMPUTE @=LAG(@)+1. ELSE. COMPUTE @=LAG(@). END IF. AGGREGATE OUTFILE * / BREAK ID @ long/ NC=MAX(Counter).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Snuffy Dog
Here is how I would go about it. In a nutshell I concatenate all the values into a string variable and use the string index functions to find the patterns and take out that substring if the pattern is found. You just need to loop the max numbers of patterns that can actually be found (I believe I actually have one extra iteration for each loop except the last).
The code is repetitive enough that it could certainly be wrapped up in a macro. As far as I can tell, your solutions you posted in the data list are not correct, unless there are other stipulations. E.g. the first row only has 1 instance of two three's in a row (being mutually exclusive), and the fifth row has 3 examples of five consecutive 4's. I added an additional row to show it will count multiple instances of 7 ones in a row. *****************************************************. DATA LIST / ID 1 v1 TO v34 2-35 solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 11111111111111123334414541254325444201 END DATA. *SOLUTION 1. string vall (A34). compute vall = " ". do repeat v = v1 to v34. compute vall = CONCAT(RTRIM(vall),STRING(v,F1.0)). end repeat. exe. string tempv (A34). compute tempv = vall. compute sol1 = 0. *Search for 7 consectutive ones - then take out if found, then search again. loop #i = 1 to 5. compute #find = INDEX(tempv,"1111111"). if #find > 0 sol1 = sol1 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+7))). end loop. exe. *SOLUTION 2. compute tempv = vall. compute sol2 = 0. *Search for 5 consectutive 4s - then take out if found, then search again. loop #i = 1 to 7. compute #find = INDEX(tempv,"44444"). if #find > 0 sol2 = sol2 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+5))). end loop. exe. *SOLUTION 3. compute tempv = vall. compute sol3 = 0. *Search for 2 consectutive threes - then take out if found, then search again. loop #i = 1 to 17. compute #find = INDEX(tempv,"33"). if #find > 0 sol3 = sol3 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+2))). end loop. exe. *****************************************************. |
Administrator
|
s1 s2 s3
1 1.00 .00 1.00 2 .00 .00 .00 3 .00 .00 .00 4 .00 .00 .00 5 .00 2.00 .00 6 .00 1.00 .00 7 .00 .00 .00 8 .00 .00 3.00 Personally, I opted to not jump down the rabbit hole (this time). In my own production code, rather than butcher the data with V2C I would just build counters and run through the vector without the concatenation. OTOH: Such threads can linger and suck up my time. BUT! YMMV. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Andy W
Here is a simple Python solution using
the SPSSINC TRANS extension command from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral).
First, it reads the variables as a single string. Then it uses a tiny function with SPSSINC TRANS to count the number of occurrences of the specified pattern in that string variable. re.findall returns a list of all the matches of the pattern, and the len function returns the length of the lsit. DATA LIST / ID 1 v1v34 (A34) solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 11111111111111123334414541254325444201 END DATA. dataset name data. begin program. import re def countpattern(v, pattern): return len(re.findall(pattern, v)) end program. spssinc trans result=pattcount /formula "countpattern(v1v34, '1111111')". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Andy W <[hidden email]> To: [hidden email], Date: 03/19/2013 07:35 AM Subject: Re: [SPSSX-L] Detecting response patterns Sent by: "SPSSX(r) Discussion" <[hidden email]> Here is how I would go about it. In a nutshell I concatenate all the values into a string variable and use the string index functions to find the patterns and take out that substring if the pattern is found. You just need to loop the max numbers of patterns that can actually be found (I believe I actually have one extra iteration for each loop except the last). The code is repetitive enough that it could certainly be wrapped up in a macro. As far as I can tell, your solutions you posted in the data list are not correct, unless there are other stipulations. E.g. the first row only has 1 instance of two three's in a row (being mutually exclusive), and the fifth row has 3 examples of five consecutive 4's. I added an additional row to show it will count multiple instances of 7 ones in a row. *****************************************************. DATA LIST / ID 1 v1 TO v34 2-35 solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 11111111111111123334414541254325444201 END DATA. *SOLUTION 1. string vall (A34). compute vall = " ". do repeat v = v1 to v34. compute vall = CONCAT(RTRIM(vall),STRING(v,F1.0)). end repeat. exe. string tempv (A34). compute tempv = vall. compute sol1 = 0. *Search for 7 consectutive ones - then take out if found, then search again. loop #i = 1 to 5. compute #find = INDEX(tempv,"1111111"). if #find > 0 sol1 = sol1 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+7))). end loop. exe. *SOLUTION 2. compute tempv = vall. compute sol2 = 0. *Search for 5 consectutive 4s - then take out if found, then search again. loop #i = 1 to 7. compute #find = INDEX(tempv,"44444"). if #find > 0 sol2 = sol2 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+5))). end loop. exe. *SOLUTION 3. compute tempv = vall. compute sol3 = 0. *Search for 2 consectutive threes - then take out if found, then search again. loop #i = 1 to 17. compute #find = INDEX(tempv,"33"). if #find > 0 sol3 = sol3 + 1. if #find > 0 tempv = CONCAT(substr(tempv,1,#find - 1),RTRIM(substr(tempv,#find+2))). end loop. exe. *****************************************************. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Detecting-response-patterns-tp5718847p5718853.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Nifty Jon!
-- Thought of the day: IBM needs to take all these nifty snake handler tricks (python extensions) and build them into the syntax language proper. Perhaps that would derail development resources from vital revolutionary endeavors such as making the Model Viewer remotely useful. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I concur regex's within native SPSS code would be wonderful (and this is a perfect example of how they are really powerful compared to native SPSS functionality).
I also agree about jumping down the rabbit hole (I certainly waste too much time of my time here doing that ...) |
Administrator
|
Rabbit Hole -->
Upside: The Mad Hatter cooks up some really powerful tea! Downside: That kwazy wabbit is hard to catch and too tough for stew! -----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Snuffy Dog
I will point out that your counting rules alone are underspecified in
a particular way, by giving an example. If there are six 3's in sequence, this could be considered (a) two sets of three 3s; (b) four sets of three 3s (starting in positions 1, 2, 3 or 4); or (c) one set of "at least three 3s." The example does rule out my solution b, but not my solution c. I wonder if the Python solution allows for selecting between a and b. -- Rich Ulrich Date: Tue, 19 Mar 2013 21:06:38 +1100 From: [hidden email] Subject: Detecting response patterns To: [hidden email] Dear Good Listers, I am looking for code to calculate a value for variables Solution1, Solution2 and Solution3. An input program has been included. The way in which each solution variable is computed is
described below as are two rules which need to be followed. I would be grateful for any assistance which can be provided. This is another tricky problem that is far beyond my humble skills.
Kind regards, Jonahtan **************************************************************************************************************************. DATA LIST / ID 1 v1 TO v34 2-35 solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 END DATA. **************************************************************************************************************************. * There are three solution variables and each records a specific response pattern which may occur more than once with mutiple reponse * patterns possible for the same ID.
* * COMPUTE SOLUTION 1 - If there are more than 7 consecutive integers of value '1' repeated in succession across v1-v34 solution1 = 1 * ELSE solution1 = 0. * COMPUTE SOLUTION 2 - If there are more than 5 consecutive integers of value '4' repeated in succession across v1-v34 solution2 = 1 * ELSE solution2 = 0.
* COMPUTE SOLUTION 3 - If there are more than 2 consecutive integers of value '3' repeated in succession across v1-v34 solution3 = 1 * ELSE solution3 = 0. * * RULE A: IF any of these consecutive sets of strings (e.g., 444444) occurs more than once for the same ID then the count of these response * sets should also be recorded as the solution variable value which is, in essence, a count of the occurances of each response pattern. For
* example, for ID=8 there are 3 sets of consecutive integers of value '3' in seperate successions across v1-v34 and this is why solution = 3. * * RULE B: Also note that it is possible for there to be more than one type of response pattern for each ID, for example ID=1 has two
* response patterns. **************************************************************************************************************************. |
I don't know why anyone would want (b).
The Python code produces(a). It would also allow you to do
(c) just by making the pattern be
'333+' in the formula subcommand. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Rich Ulrich <[hidden email]> To: [hidden email], Date: 03/19/2013 11:51 AM Subject: Re: [SPSSX-L] Detecting response patterns Sent by: "SPSSX(r) Discussion" <[hidden email]> I will point out that your counting rules alone are underspecified in a particular way, by giving an example. If there are six 3's in sequence, this could be considered (a) two sets of three 3s; (b) four sets of three 3s (starting in positions 1, 2, 3 or 4); or (c) one set of "at least three 3s." The example does rule out my solution b, but not my solution c. I wonder if the Python solution allows for selecting between a and b. -- Rich Ulrich Date: Tue, 19 Mar 2013 21:06:38 +1100 From: [hidden email] Subject: Detecting response patterns To: [hidden email] Dear Good Listers, I am looking for code to calculate a value for variables Solution1, Solution2 and Solution3. An input program has been included. The way in which each solution variable is computed is described below as are two rules which need to be followed. I would be grateful for any assistance which can be provided. This is another tricky problem that is far beyond my humble skills. Kind regards, Jonahtan **************************************************************************************************************************. DATA LIST / ID 1 v1 TO v34 2-35 solution1 TO solution3 36-38. BEGIN DATA 11111511111111223334414541254325444103 21111511111155222224414541254325444000 31111511113111222224414541254325444000 41111511211111225224414541254325444000 51111444444111222224444431254444444020 61111443444111222224444431254444444010 71111511511111222224414541254325444000 81111511333111223334414541333325444003 END DATA. **************************************************************************************************************************. * There are three solution variables and each records a specific response pattern which may occur more than once with mutiple reponse * patterns possible for the same ID. * * COMPUTE SOLUTION 1 - If there are more than 7 consecutive integers of value '1' repeated in succession across v1-v34 solution1 = 1 * ELSE solution1 = 0. * COMPUTE SOLUTION 2 - If there are more than 5 consecutive integers of value '4' repeated in succession across v1-v34 solution2 = 1 * ELSE solution2 = 0. * COMPUTE SOLUTION 3 - If there are more than 2 consecutive integers of value '3' repeated in succession across v1-v34 solution3 = 1 * ELSE solution3 = 0. * * RULE A: IF any of these consecutive sets of strings (e.g., 444444) occurs more than once for the same ID then the count of these response * sets should also be recorded as the solution variable value which is, in essence, a count of the occurances of each response pattern. For * example, for ID=8 there are 3 sets of consecutive integers of value '3' in seperate successions across v1-v34 and this is why solution = 3. * * RULE B: Also note that it is possible for there to be more than one type of response pattern for each ID, for example ID=1 has two * response patterns. **************************************************************************************************************************. |
Thanks for the info about finding "3 or more".
You would want (b) if your task was to do something with "every variable having 3 followed by at least two more of them." I remember being annoyed by a word processor (used on data) that always did its searches this way. -- Rich Ulrich To: [hidden email] CC: [hidden email] Subject: Re: [SPSSX-L] Detecting response patterns From: [hidden email] Date: Tue, 19 Mar 2013 12:24:55 -0600 I don't know why anyone would want (b). The Python code produces(a). It would also allow you to do (c) just by making the pattern be '333+' in the formula subcommand. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Rich Ulrich <[hidden email]> To: [hidden email], Date: 03/19/2013 11:51 AM Subject: Re: [SPSSX-L] Detecting response patterns Sent by: "SPSSX(r) Discussion" <[hidden email]> I will point out that your counting rules alone are underspecified in a particular way, by giving an example. If there are six 3's in sequence, this could be considered (a) two sets of three 3s; (b) four sets of three 3s (starting in positions 1, 2, 3 or 4); or (c) one set of "at least three 3s." The example does rule out my solution b, but not my solution c. I wonder if the Python solution allows for selecting between a and b. -- Rich Ulrich ... snip original |
Dear David, Andy, John and Rich,
I must say I'm surprised by the range of solutions provided. These were very useful indeed, I liked your solution Andy. You were quite right Andy I did make a mistake on solution 3 on the first row. This problem that I posted, which is designed to identify problematic response patterns in surveys, highlights a problem that Rich refers to, where depending on when you start the counting sequence its ambiguous as to how many instances are there in a parcelled string set. Does 3333 represent two strings of 3 or only one. Consistency artefacts in surveys are probably too infrequent to detect using latent variable methods and the usual methods of detecting common method variance are problematic and probably statistically underidentified in a confirmatory factor analytic context, thus the more basic counting approach I'm trying to operationalize.
I did want to add that I feel so very obliged for the extraordinary help I receive form people on this list and I don't know where many of you find the motivation and time. John and David have been helping me out with these sorts of problems for many years now. I don't want any of you to go to any trouble and I'm quite happy for people to ignore these questions if they look too burdensome. For the assistance that is provided I am very grateful. I used to program frequenrly in spss and at the time I started to get quite good at it, but reached a point with it about 5 years ago where I burnt out and couldn't write code anymore without going crazy. Now I find my skills deficient on those rare occasions when I can only get my work down by using a program like spss and yet have to push forward with it trying to hold the craziness at bay. Please don't go down any rabbit warrens on my account. Best wishes, Jonahtan. On Wed, Mar 20, 2013 at 12:16 PM, Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
You are very welcome Jonahtan (very interesting name I initially parsed it as Jonathan).
BTW : There is no H in Jon (That's why I sometimes call him JoNoH ;-) Anyway, as you have seen there are usually many ways to skin a cat. You would be amazed at how I might start with something coded a given way and end up completely different after a few iterations. I have one mess which started as a SLOW 4 level nested loop in matrix and presently is one line of really FAST code. Had to analyze the hell out of it but then BAM! Re Time and motivation: I have always had an instinctive desire to teach others. Time? My posts are typically 5 minute breaks from coding insanity (I can rarely stare at MATRIX and MACRO code for more than 2-3 hours without taking a break so I drop by and throw down a few hints). Yeah, it's definitely use it or lose it! For awhile (about 7 years ago) I didn't have access to a computer to run SPSS and my skills became a bit rusty for awhile (that's after 10+ years of really intense mastery on a daily basis). Now they are as solid as they ever were (maybe even better than when I was at SPSS consulting). I suspect my scripting skills might need a bit of polish since I haven't need to work with them much lately. Looking forward to diving face first into python and GPL in the coming weeks for another project. BUT think of it this way. It's like riding a bicycle. You never really completely forget how to do it. Rabbit holes can sometimes be interesting! David
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |