From time to time I import data directly from SQL Server and some of them
are free text fields the user might fill in. When I run a FREQ o such a fields I get a warning about non-printing characters in some records and I probably loose some data. Is there a list of such a non-printing characters? Has anyone found a solution to isolate/cancel such characters while importing from SQL Server? Thank you, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
Luca,
In the back of the syntax reference (Appendix B) is a table of import/export character sets. I've had this problem too and I resorted to searching for characters outside of the 'normal' character set (a-z, A-Z, 0-9) and then editing the database. The kinds of errors I remember finding were tab, CR (carriage return) or LF (line feed) characters. I think that a cleverer person could exploit some code Ray Levesques posted a couple of years ago to convert the character to a position number and then search by position number. I think you may have a hard, grueling job ahead of you if you fix the database source, which you will need to do if you repeatedly export from that table. Gene Maguin |
In reply to this post by Luca Meyer
At 03:55 AM 12/6/2006, Luca Meyer wrote:
> From time to time I import data directly from SQL Server and some of > them are free text fields the user might fill in. When I run a FREQ > on such a fields I get a warning about non-printing characters in > some records and I probably loose some data. You've seen Gene Maguin's response. You might, also, try running your FREQUENCIES assigning format AHEXm, rather than An, to your string variables. (If you do this, 'm' must be twice 'n'.) That will at least let you see what you have, though the text won't be very readable. If you have a character string containing the non-printing characters you want to get rid of, you can use REPLACE (SPSS 14 and 15) or INDEX and SUBSTRING (earlier versions) to substitute some printable replacement. That means getting the string of non-printables, of course. Characters x00 to x1F, plus x80 to x9F, would be a start. SPSS makes this harder than it needs to be, because it doesn't support character constants expressed in hexadecimal. (Sometimes I really miss SAS.) There's probably a Python way around this, though. |
In reply to this post by Luca Meyer
Non-printing characters could be coming from your original database (tabs, line feed, carriage return and other character with codes below decimal value 32. They could also occur if your database text is in Unicode and includes characters that cannot be displayed in the normal Windows Western European code page.
You can see what the codes for these are by changing the format of the variable to AHEX. You can't do that in the data editor, but you can do it with format thevariable(ahex16). The number in the format should be twice the width of the field in A format. The display will show a sequence of hexadecimal numbers, so, for example, 202020202020 ... would be a sequence of blanks (decimal code 32). I'm not sure, but if you run a frequencies after changing this format, you may get details on what these values are without the warnings. Of course, the ordinary text will be unreadable, but this may help to diagnose the problem. Regards, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer Sent: Wednesday, December 06, 2006 2:55 AM To: [hidden email] Subject: [SPSSX-L] Non-printing characters From time to time I import data directly from SQL Server and some of them are free text fields the user might fill in. When I run a FREQ o such a fields I get a warning about non-printing characters in some records and I probably loose some data. Is there a list of such a non-printing characters? Has anyone found a solution to isolate/cancel such characters while importing from SQL Server? Thank you, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com <http://www.lucameyer.com/> - Tel: +39.339.495.00.21 |
In reply to this post by Richard Ristow
(Of course, there is a Python way, but I won't go into that here).
Richard's advice is correct except that most of the character values in x80 - x9f are actually legal printing characters in Windows. (They are not in the character sets typically used on Unix/Linux.) x80, for example is the Euro symbol. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, December 06, 2006 1:23 PM To: [hidden email] Subject: Re: [SPSSX-L] Non-printing characters At 03:55 AM 12/6/2006, Luca Meyer wrote: > From time to time I import data directly from SQL Server and some of > them are free text fields the user might fill in. When I run a FREQ > on such a fields I get a warning about non-printing characters in > some records and I probably loose some data. You've seen Gene Maguin's response. You might, also, try running your FREQUENCIES assigning format AHEXm, rather than An, to your string variables. (If you do this, 'm' must be twice 'n'.) That will at least let you see what you have, though the text won't be very readable. If you have a character string containing the non-printing characters you want to get rid of, you can use REPLACE (SPSS 14 and 15) or INDEX and SUBSTRING (earlier versions) to substitute some printable replacement. That means getting the string of non-printables, of course. Characters x00 to x1F, plus x80 to x9F, would be a start. SPSS makes this harder than it needs to be, because it doesn't support character constants expressed in hexadecimal. (Sometimes I really miss SAS.) There's probably a Python way around this, though. |
estimados,
le escribo a los miembros de habla hispana de la lista para solicitarles sugerencias, de acuerdo a la experiencia y conocimientos que tengan, respecto a consultoras y instituciones que realicen estudios de opinión pública (encuestas cara a cara de carácter nacional) en argentina, brasil, perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos realizar en américa latina. lo más importante es la rigurosidad del trabajo y el prestigio de la institución (encuestas confiables). esperando su consejo y comentarios se despide... --
Sebastián
Daza Aranzaes |
In reply to this post by Luca Meyer
I think this posting from Ray Levesque on Jan 17, 2005 (7:55PM) may be
useful. The following illustrates how to create a string variable containing a TAB character. That variable can then be used in you ANY function. DATA LIST LIST /str1(A2). BEGIN DATA 79 /* letter O */ 80 /* letter P */ 09 /* TAB character */ END DATA. STRING char(A1) str25(A25). COMPUTE char=STRING(NUMBER(str1,N2),PIB1). COMPUTE str25=CONCAT("---",char,"==="). LIST str25. HTH Raynald Levesque [hidden email] Visit my SPSS site: http://www.spsstools.net It would seem that you could search the string character by character and convert each character to its ASCII 7 bit value using the above code example. Strings having characters whose converted values were less than 48 (i.e. zero=0) could be flagged for editting. Gene Maguin |
In reply to this post by Sebastián Daza
Hola a todos:
A mi parecer, en el caso del Perú, las empresas y consultoras que realizan estudios de opinión pública se han desarrollado bastante bien y en el caso peruano tenemos empresas como Apoyo investigación y mercado que es una de las mas conocidas y con mas tiempo en el mercado, también se encuentra Arellano investigación y Marketing que también se esta posicionando dentro del mercado de consultoras, esta CPI, Datum, Inmark, IMA y muchas más. Pero sugiero averiguar costos y productos con las 3 primeras Saludos cordiales, Eduardo Romero. _____ De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Sebastián Daza Enviado el: miércoles, 06 de diciembre de 2006 16:46 Para: [hidden email] Asunto: consulta y sugerencias estimados, le escribo a los miembros de habla hispana de la lista para solicitarles sugerencias, de acuerdo a la experiencia y conocimientos que tengan, respecto a consultoras y instituciones que realicen estudios de opinión pública (encuestas cara a cara de carácter nacional) en argentina, brasil, perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos realizar en américa latina. lo más importante es la rigurosidad del trabajo y el prestigio de la institución (encuestas confiables). esperando su consejo y comentarios se despide... -- Sebastián Daza Aranzaes Instituto de Sociología UC 8-471 53 87 / 686 57 20 / Fax 5521834 [hidden email] |
In reply to this post by Maguin, Eugene
At 05:04 PM 12/6/2006, Gene Maguin wrote:
>I think this posting from Ray Levesque on Jan 17, 2005 (7:55PM) may be >useful. The following illustrates how to create a string variable >containing a TAB character. Thank you, Gene! Gosh, when all's said, Raynald really is the master of us all. I've adapted the code to make a crude hex-to-character converter; see below. (I'm sure SPSS 15's ability to write transformations in Python could do this a lot better.) . To myself: no separately saved syntax or listing file . There's no error checking of the input, except to terminate when encountering a character that isn't valid as a hex digit. NEW FILE. DATA LIST FIXED / HEX_CHAR 01-20 (A). Data List will read 1 records from the command file Variable Rec Start End Format HEX_CHAR 1 1 20 A20 BEGIN DATA 5261796e616c64 END DATA. STRING CHAR(A20). LOOP #POS = 01 TO 99 BY 2 IF #POS LE LENGTH(HEX_CHAR) - 1. . NUMERIC #HEX_HI #HEX_LO (F2). . STRING #HEX_DIG #ASCII (A1). . COMPUTE #HEX_DIG = LOWER(SUBSTR(HEX_CHAR,#POS, 1)). . COMPUTE #HEX_HI = INDEX('0123456789abcdef',#HEX_DIG). . DO IF #HEX_HI GT 0. . COMPUTE #HEX_HI = #HEX_HI - 1. . ELSE. . BREAK. . END IF. . COMPUTE #HEX_DIG = LOWER(SUBSTR(HEX_CHAR,#POS+1,1)). . COMPUTE #HEX_LO = INDEX('0123456789abcdef',#HEX_DIG). . DO IF #HEX_LO GT 0. . COMPUTE #HEX_LO = #HEX_LO - 1. . ELSE. . BREAK. . END IF. . COMPUTE #ASCII = STRING(16*#HEX_HI+#HEX_LO,PIB1). . COMPUTE CHAR = CONCAT(RTRIM(CHAR),#ASCII). END LOOP. LIST. Notes |-----------------------------|---------------------------| |Output Created |06-DEC-2006 20:34:13 | |-----------------------------|---------------------------| HEX_CHAR CHAR 5261796e616c64 Raynald Number of cases read: 1 Number of cases listed: 1 >It would seem that you could search the string character by character >and convert each character to its ASCII 7 bit value using the above >code example. Strings having characters whose converted values were >less than 48 (i.e. zero=0) could be flagged for editting. That's going the other direction, and it has to be possible, but I haven't tried it. (I'm about run out of programming energy for the day.) |
In reply to this post by Sebastián Daza
Estimado Sebastián, respecto a tu consulta, te queria comentar que tengo una
agencia de investigación de mercado y opinión pública en BA Argentina, puedes visitar nuestra página web <BLOCKED::http://www.cribaweb.com.ar> www.cribaweb.com.ar, y si deseas puedo pasarte algunos contactos-clientes a los cuales pedirles referencias nuestras (Telefonica, Procter & Gamble, La Caja de Ahorro y Seguro, Banco Francés...) Desde ya, cuenta conmigo para cualquier necesidad que tengas respecto de este estudio ..Como información adicional, soy Licenciada en Ciencia Política (USAL) y Maestría en FLACSO.. y hemos desarrollado estudios regionales (Chile, Uruguay, Bolivia, Colombia) Si deseas, puedo pasarte algunos datos de empresas en otros países.. Cualquier necesidad que tengas, me avisas. Saludos, Ariana Ariana Mathieu www.cribaweb.com.ar <BLOCKED::http://www.cribaweb.com.ar> Tel 54-11-43745400 int 106 [hidden email] <BLOCKED::mailto:[hidden email]> _____ De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Sebastián Daza Enviado el: Miércoles, 06 de Diciembre de 2006 06:46 p.m. Para: [hidden email] Asunto: consulta y sugerencias estimados, le escribo a los miembros de habla hispana de la lista para solicitarles sugerencias, de acuerdo a la experiencia y conocimientos que tengan, respecto a consultoras y instituciones que realicen estudios de opinión pública (encuestas cara a cara de carácter nacional) en argentina, brasil, perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos realizar en américa latina. lo más importante es la rigurosidad del trabajo y el prestigio de la institución (encuestas confiables). esperando su consejo y comentarios se despide... -- Sebastián Daza Aranzaes Instituto de Sociología UC 8-471 53 87 / 686 57 20 / Fax 5521834 [hidden email] __________ NOD32 1.1454 (20060321) Information __________ This message was checked by NOD32 antivirus system. http://www.nod32.com |
Free forum by Nabble | Edit this page |