Hello again :) I have base like this ID,var1 1,a) żaba żabka albo żabeczka b) łapka 2,a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek 3,a) zenon b) marian i hela c) alekasadra(ola) and want to have like this. ID, var2, var3, var4, var5 1, a) żaba żabka albo żabeczka, b) łapka 2, a) ryba rybka rybeńka maleńka (np.sledzik), b) kotek, c) piesek, d) chomieczek 3, a) zenon, b) marian i hela, c) alekasadra(ola), To do this I run char.index to to find where "a)" , "b)", "c)" and "d)" were used and substr do split text. It works if don't use polish letters like "ż", "ł". This is caused by the fact that substr count that letters as 2 characters. Below example. Maybe you can show me other method, how to do it and keep polish letters? ********************************** *without polish letters ********************************** data list list /ID(f8.0) var1(a90). begin data. 1 'a) zaba zabka albo zabeczka b) lapka' 2 'a) ryba rybka rybenka malenka (np.sledzik) b) kotek c) piesek d) chomieczek' 3 'a) zenon b) marian i hela c) alekasadra(ola)' 4 5 6 7 8 9 10 end data. execute. DATASET NAME base1. DATASET ACTIVATE base1. compute a=CHAR.INDEX(var1, 'a)'). compute b=CHAR.INDEX(var1, 'b)'). compute c=CHAR.INDEX(var1, 'c)'). compute d=CHAR.INDEX(var1, 'd)'). execute. string var2 to var5(a60). do if a<>0 and b<>0. compute var2=SUBSTR(var1, a, b-a). else if a<>0 and b=0. compute var2=SUBSTR(var1, a, 90). end if. execute. do if b<>0 and c<>0. compute var3=SUBSTR(var1, b, c-b). else if b<>0 and c=0. compute var3=SUBSTR(var1, b, 90). end if. execute. do if c<>0 and d<>0. compute var4=SUBSTR(var1, c, d-c). else if c<>0 and d=0. compute var4=SUBSTR(var1, c, 90). end if. execute. do if d<>0. compute var5=SUBSTR(var1, d, 90). end if. execute. ********************************** *with polish letters ********************************** data list list /ID(f8.0) var1(a90). begin data. 1 'a) żaba żabka albo żabeczka b) łapka' 2 'a) ryba rybka rybeńka maleńka (np.sledzik) b) kotek c) piesek d) chomieczek' 3 'a) zenon b) marian i hela c) alekasadra(ola)' 4 5 6 7 8 9 10 end data. execute. DATASET NAME base2. DATASET ACTIVATE base2. compute a=CHAR.INDEX(var1, 'a)'). compute b=CHAR.INDEX(var1, 'b)'). compute c=CHAR.INDEX(var1, 'c)'). compute d=CHAR.INDEX(var1, 'd)'). execute. string var2 to var5(a60). do if a<>0 and b<>0. compute var2=SUBSTR(var1, a, b-a). else if a<>0 and b=0. compute var2=SUBSTR(var1, a, 90). end if. execute. do if b<>0 and c<>0. compute var3=SUBSTR(var1, b, c-b). else if b<>0 and c=0. compute var3=SUBSTR(var1, b, 90). end if. execute. do if c<>0 and d<>0. compute var4=SUBSTR(var1, c, d-c). else if c<>0 and d=0. compute var4=SUBSTR(var1, c, 90). end if. execute. do if d<>0. compute var5=SUBSTR(var1, d, 90). end if. execute. |
Use char.substr. That works on characters regardless of number of bytes . This could also be done with a regular expression and spssinc trans with much less code. On Tue, Jul 4, 2017 at 6:14 PM 88Videoclips . <[hidden email]> wrote:
|
Administrator
|
In reply to this post by 88videos
Here is a shorter version of your syntax that appears to work.
DO REPEAT v = a b c d / s = 'a)' 'b)' 'c)' 'd)'. - COMPUTE v=CHAR.INDEX(var1,s). END REPEAT. STRING var2 to var5(a60). DO REPEAT a = a b c / b = b c d / v = var2 var3 var4. - DO IF a NE 0. - IF b NE 0 v=CHAR.SUBSTR(var1, a, b-a). - IF b EQ 0 v=CHAR.SUBSTR(var1, a, 90). - END IF. END REPEAT. IF d NE 0 var5=CHAR.SUBSTR(var1, d, 90). FORMATS a to d (F5.0). LIST var2 to var5.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
These patterns are all at risk, though, of parentheses in the main text. Case 3 has k). If a-d) could appear in the text, a smarter algorithm that would ignore matching parentheses would be needed. On Wed, Jul 5, 2017 at 6:38 AM, Bruce Weaver <[hidden email]> wrote: Here is a shorter version of your syntax that appears to work. |
Administrator
|
In reply to this post by 88videos
First of all use CHAR.SUBSTR and CHAR.INDEX.
Second, your embedded ) is boning you. I suggest searching for ( and then locating the matching ). Replace these with [ and ] respectively. ---
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by 88videos
Like always I got very useful advices here.Thanks!! As Jon and David mentioned the most problematic issue is that "a)/b)/c)..." could be part of text and appear in case many times. It is not inconceivable that some work must be done under supervision.2017-07-05 2:14 GMT+02:00 88Videoclips . <[hidden email]>:
|
Here is code that can handle grouped parentheses. It assumes that a pair (...) should not match the markers such as a) even if it contains that string. First it finds pairs and changes the parentheses to left and right chevrons, assuming that these would not occur in text. Then it splits at a), b), ... and then puts the the original parentheses back. The code follows, but if it gets mangled by the listserv, send me an email ([hidden email]), and I will send the code as a file. Note that for test purposes I changed the input in case 2 to contain sledzia) instead of the original. The spssinc trans command creates string variables v1...v4 of length 50. Change the 50 as needed. This code can easily be tweaked for any number of blocks. If there are fewer than 4 blocks, the extra variables are blank. Statistics should be in Unicode mode for this. * Encoding: UTF-8. data list list /ID(f8.0) var1(a90). begin data. 1 'a) żaba żabka albo żabeczka b) łapka' 2 'a) ryba rybka rybeńka maleńka (np.sledzia) b) kotek c) piesek d) chomieczek' 3 'a) zenon b) marian i hela c) alekasadra(ola)' 4 5 6 7 8 9 10 end data. DATASET NAME base2. begin program. import re def splitter(s): s = re.sub(r"\((.*?)\)", unichr(171)+r"\1"+unichr(187), s) locs=re.findall(r"[a-d]\)",s) pos = [s.index(item) for item in locs] pos.append(len(s)) parts = [s[pos[i]:pos[i+1]] for i in range(len(pos)-1)] parts = [re.sub(unichr(171), r"(", part) for part in parts] parts = [re.sub(unichr(187), r")", part) for part in parts] return parts end program. spssinc trans result=v1 v2 v3 v4 type=50 /formula "splitter(var1)". On Thu, Jul 6, 2017 at 5:50 AM, 88Videoclips . <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |