Non-printing characters

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Non-printing characters

Luca Meyer
From time to time I import data directly from SQL Server and some of them
are free text fields the user might fill in. When I run a FREQ o such a
fields I get a warning about non-printing characters in some records and I
probably loose some data.

Is there a list of such a non-printing characters? Has anyone found a
solution to isolate/cancel such characters while importing from SQL Server?

Thank you,
Luca

Mr. Luca MEYER
Market research, data analysis & more
www.lucameyer.com <http://www.lucameyer.com/>  - Tel: +39.339.495.00.21
Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Maguin, Eugene
Luca,

In the back of the syntax reference (Appendix B) is a table of import/export
character sets. I've had this problem too and I resorted to searching for
characters outside of the 'normal' character set (a-z, A-Z, 0-9) and then
editing the database. The kinds of errors I remember finding were tab, CR
(carriage return) or LF (line feed) characters. I think that a cleverer
person could exploit some code Ray Levesques posted a couple of years ago to
convert the character to a position number and then search by position
number. I think you may have a hard, grueling job ahead of you if you fix
the database source, which you will need to do if you repeatedly export from
that table.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Richard Ristow
In reply to this post by Luca Meyer
At 03:55 AM 12/6/2006, Luca Meyer wrote:

> From time to time I import data directly from SQL Server and some of
> them are free text fields the user might fill in. When I run a FREQ
> on such a fields I get a warning about non-printing characters in
> some records and I probably loose some data.

You've seen Gene Maguin's response.

You might, also, try running your FREQUENCIES assigning format AHEXm,
rather than An, to your string variables. (If you do this, 'm' must be
twice 'n'.) That will at least let you see what you have, though the
text won't be very readable.

If you have a character string containing the non-printing characters
you want to get rid of, you can use REPLACE (SPSS 14 and 15) or INDEX
and SUBSTRING (earlier versions) to substitute some printable
replacement. That means getting the string of non-printables, of
course. Characters x00 to x1F, plus x80 to x9F, would be a start.

SPSS makes this harder than it needs to be, because it doesn't support
character constants expressed in hexadecimal. (Sometimes I really miss
SAS.) There's probably a Python way around this, though.
Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Peck, Jon
In reply to this post by Luca Meyer
Non-printing characters could be coming from your original database (tabs, line feed, carriage return and other character with codes below decimal value 32.  They could also occur if your database text is in Unicode and includes characters that cannot be displayed in the normal Windows Western European code page.

You can see what the codes for these are by changing the format of the variable to AHEX.  You can't do that in the data editor, but you can do it with
format thevariable(ahex16).
The number in the format should be twice the width of the field in A format.  The display will show a sequence of hexadecimal numbers, so, for example,
202020202020 ...
would be a sequence of blanks (decimal code 32).

I'm not sure, but if you run a frequencies after changing this format, you may get details on what these values are without the warnings.  Of course, the ordinary text will be unreadable, but this may help to diagnose the problem.

Regards,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Luca Meyer
Sent: Wednesday, December 06, 2006 2:55 AM
To: [hidden email]
Subject: [SPSSX-L] Non-printing characters

From time to time I import data directly from SQL Server and some of them
are free text fields the user might fill in. When I run a FREQ o such a
fields I get a warning about non-printing characters in some records and I
probably loose some data.

Is there a list of such a non-printing characters? Has anyone found a
solution to isolate/cancel such characters while importing from SQL Server?

Thank you,
Luca

Mr. Luca MEYER
Market research, data analysis & more
www.lucameyer.com <http://www.lucameyer.com/>  - Tel: +39.339.495.00.21
Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Peck, Jon
In reply to this post by Richard Ristow
(Of course, there is a Python way, but I won't go into that here).

Richard's advice is correct except that most of the character values in x80 - x9f are actually legal printing characters in Windows.  (They are not in the character sets typically used on Unix/Linux.)  x80, for example is the Euro symbol.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow
Sent: Wednesday, December 06, 2006 1:23 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Non-printing characters

At 03:55 AM 12/6/2006, Luca Meyer wrote:

> From time to time I import data directly from SQL Server and some of
> them are free text fields the user might fill in. When I run a FREQ
> on such a fields I get a warning about non-printing characters in
> some records and I probably loose some data.

You've seen Gene Maguin's response.

You might, also, try running your FREQUENCIES assigning format AHEXm,
rather than An, to your string variables. (If you do this, 'm' must be
twice 'n'.) That will at least let you see what you have, though the
text won't be very readable.

If you have a character string containing the non-printing characters
you want to get rid of, you can use REPLACE (SPSS 14 and 15) or INDEX
and SUBSTRING (earlier versions) to substitute some printable
replacement. That means getting the string of non-printables, of
course. Characters x00 to x1F, plus x80 to x9F, would be a start.

SPSS makes this harder than it needs to be, because it doesn't support
character constants expressed in hexadecimal. (Sometimes I really miss
SAS.) There's probably a Python way around this, though.
Reply | Threaded
Open this post in threaded view
|

consulta y sugerencias

Sebastián Daza
estimados,
le escribo a los miembros de habla hispana de la lista para solicitarles sugerencias, de acuerdo a la experiencia y conocimientos que tengan, respecto a consultoras y instituciones que realicen estudios de opinión pública (encuestas cara a cara de carácter nacional) en argentina, brasil, perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos realizar en américa latina. lo más importante es la rigurosidad del trabajo y el prestigio de la institución (encuestas confiables). esperando su consejo y comentarios se despide...

--
Sebastián Daza Aranzaes

Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Maguin, Eugene
In reply to this post by Luca Meyer
I think this posting from Ray Levesque on Jan 17, 2005 (7:55PM) may be
useful.

The following illustrates how to create a string variable containing a TAB
character. That variable can then be used in you ANY function.

DATA LIST LIST /str1(A2).
BEGIN DATA
79      /* letter O */
80      /* letter P */
09      /* TAB character */
END DATA.

STRING char(A1) str25(A25).
COMPUTE char=STRING(NUMBER(str1,N2),PIB1).
COMPUTE str25=CONCAT("---",char,"===").
LIST str25.

HTH

Raynald Levesque [hidden email]
Visit my SPSS site: http://www.spsstools.net



It would seem that you could search the string character by character and
convert each character to its ASCII 7 bit value using the above code
example. Strings having characters whose converted values were less than 48
(i.e. zero=0) could be flagged for editting.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

sugerencia consultoras de mercado Perú...

Eduardo Romero
In reply to this post by Sebastián Daza
Hola a todos:

 

A mi parecer, en el caso del Perú, las empresas y consultoras que realizan
estudios de opinión pública se han desarrollado bastante bien y en el caso
peruano tenemos empresas como Apoyo investigación y mercado que es una de
las mas conocidas y con mas tiempo en el mercado, también se encuentra
Arellano investigación y Marketing que también se esta posicionando dentro
del mercado de consultoras, esta CPI, Datum, Inmark, IMA y muchas más. Pero
sugiero averiguar costos y productos con las 3 primeras

 

Saludos cordiales,

 

Eduardo Romero.

 

  _____  

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Sebastián Daza
Enviado el: miércoles, 06 de diciembre de 2006 16:46
Para: [hidden email]
Asunto: consulta y sugerencias

 

estimados,
le escribo a los miembros de habla hispana de la lista para solicitarles
sugerencias, de acuerdo a la experiencia y conocimientos que tengan,
respecto a consultoras y instituciones que realicen estudios de opinión
pública (encuestas cara a cara de carácter nacional) en argentina, brasil,
perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos
realizar en américa latina. lo más importante es la rigurosidad del trabajo
y el prestigio de la institución (encuestas confiables). esperando su
consejo y comentarios se despide...

--



Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Non-printing characters

Richard Ristow
In reply to this post by Maguin, Eugene
At 05:04 PM 12/6/2006, Gene Maguin wrote:

>I think this posting from Ray Levesque on Jan 17, 2005 (7:55PM) may be
>useful. The following illustrates how to create a string variable
>containing a TAB character.

Thank you, Gene! Gosh, when all's said, Raynald really is the master of
us all.

I've adapted the code to make a crude hex-to-character converter; see
below. (I'm sure SPSS 15's ability to write transformations in Python
could do this a lot better.)
. To myself: no separately saved syntax or listing file
. There's no error checking of the input, except to terminate when
encountering a character that isn't valid as a hex digit.

NEW FILE.
DATA LIST FIXED
    / HEX_CHAR 01-20 (A).

Data List will read 1 records from the command file

Variable          Rec   Start     End  Format
HEX_CHAR            1       1      20  A20

BEGIN DATA
5261796e616c64
END DATA.

STRING CHAR(A20).
LOOP #POS = 01 TO 99 BY 2
             IF #POS LE LENGTH(HEX_CHAR) - 1.
.  NUMERIC    #HEX_HI
               #HEX_LO  (F2).
.  STRING     #HEX_DIG
               #ASCII   (A1).

.  COMPUTE    #HEX_DIG = LOWER(SUBSTR(HEX_CHAR,#POS,  1)).
.  COMPUTE    #HEX_HI  = INDEX('0123456789abcdef',#HEX_DIG).
.  DO IF      #HEX_HI GT 0.
.     COMPUTE #HEX_HI = #HEX_HI - 1.
.  ELSE.
.     BREAK.
.  END IF.

.  COMPUTE    #HEX_DIG = LOWER(SUBSTR(HEX_CHAR,#POS+1,1)).
.  COMPUTE    #HEX_LO  = INDEX('0123456789abcdef',#HEX_DIG).
.  DO IF      #HEX_LO GT 0.
.     COMPUTE #HEX_LO = #HEX_LO - 1.
.  ELSE.
.     BREAK.
.  END IF.

.  COMPUTE  #ASCII   = STRING(16*#HEX_HI+#HEX_LO,PIB1).
.  COMPUTE  CHAR     = CONCAT(RTRIM(CHAR),#ASCII).
END LOOP.
LIST.

Notes
|-----------------------------|---------------------------|
|Output Created               |06-DEC-2006 20:34:13       |
|-----------------------------|---------------------------|
HEX_CHAR             CHAR

5261796e616c64       Raynald

Number of cases read:  1    Number of cases listed:  1


>It would seem that you could search the string character by character
>and convert each character to its ASCII 7 bit value using the above
>code example. Strings having characters whose converted values were
>less than 48 (i.e. zero=0) could be flagged for editting.

That's going the other direction, and it has to be possible, but I
haven't tried it. (I'm about run out of programming energy for the
day.)
Reply | Threaded
Open this post in threaded view
|

Re: consulta y sugerencias

Ariana Mathieu
In reply to this post by Sebastián Daza
Estimado Sebastián, respecto a tu consulta, te queria comentar que tengo una
agencia de investigación de mercado y opinión pública en BA Argentina,
puedes visitar nuestra página web  <BLOCKED::http://www.cribaweb.com.ar>
www.cribaweb.com.ar, y si deseas puedo pasarte algunos contactos-clientes a
los cuales pedirles referencias nuestras (Telefonica, Procter & Gamble, La
Caja de Ahorro y Seguro, Banco Francés...) Desde ya, cuenta conmigo para
cualquier necesidad que tengas respecto de este estudio ..Como información
adicional, soy Licenciada en Ciencia Política (USAL) y Maestría en FLACSO..
y hemos desarrollado estudios regionales (Chile, Uruguay, Bolivia, Colombia)
Si deseas, puedo pasarte algunos datos de empresas en otros países..
Cualquier necesidad que tengas, me avisas.
Saludos,
Ariana
 
Ariana Mathieu
www.cribaweb.com.ar <BLOCKED::http://www.cribaweb.com.ar>
Tel 54-11-43745400 int 106
[hidden email] <BLOCKED::mailto:[hidden email]>

  _____  

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Sebastián Daza
Enviado el: Miércoles, 06 de Diciembre de 2006 06:46 p.m.
Para: [hidden email]
Asunto: consulta y sugerencias


estimados,
le escribo a los miembros de habla hispana de la lista para solicitarles
sugerencias, de acuerdo a la experiencia y conocimientos que tengan,
respecto a consultoras y instituciones que realicen estudios de opinión
pública (encuestas cara a cara de carácter nacional) en argentina, brasil,
perú, venezuela, salvador, guatemala y méxico, para un estudio que queremos
realizar en américa latina. lo más importante es la rigurosidad del trabajo
y el prestigio de la institución (encuestas confiables). esperando su
consejo y comentarios se despide...


--


Sebastián Daza Aranzaes
Instituto de Sociología UC
8-471 53 87 / 686 57 20 / Fax 5521834
[hidden email]



__________ NOD32 1.1454 (20060321) Information __________

This message was checked by NOD32 antivirus system.
http://www.nod32.com