SPSSX Discussion

automatically concatenate one string with many variables

Classic

List

Threaded

4 messages Options

Lucas Bremer

automatically concatenate one string with many variables

Hello everybody,

I'm trying to concatenate one string variable with let's say with 100 other
numerical variables.

My data looks like this:

STRING N1 N2
N3 ... N100

AAAA 100 345
456 502

AAAB 056 101
159 312

...

The resulting data should have this format

STRING C1 C2
C3 ... C100

AAAA AAAA100 AAAA345 AAAA456
AAAA502

AAAB AAAB056 AAAB101 AAAB159
AAAB312

...

I know that I can use concat(var1,var2,.) but I don't know how to use it in
combination with a loop.

Thanks in advance.

Bye, Lucas

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Re: automatically concatenate one string with many variables

Hi Lucas

Lucas Bremer wrote:

> I'm trying to concatenate one string variable with let's say with 100 other
> numerical variables.
>
> My data looks like this:
>
> STRING N1 N2
> N3 ... N100
>
> AAAA 100 345
> 456 502
>
> AAAB 056 101
> 159 312
>
> ...
>
> The resulting data should have this format
>
> STRING C1 C2
> C3 ... C100
>
> AAAA AAAA100 AAAA345 AAAA456
> AAAA502
>
> AAAB AAAB056 AAAB101 AAAB159
> AAAB312
>
> ...
>
>
> I know that I can use concat(var1,var2,.) but I don't know how to use it in
> combination with a loop.
>

Ty this code (it assumes that the final string length is 7 characters -
4+3 -, modify it if needed):

STRING C1 TO C100 (A7).
DO REPEAT A=C1 TO C100
/B=N1 TO N100.
COMPUTE A=CONCAT(STRING,STRING(B,'N3')).
END REPEAT.
EXE.

LIST STRING C1 TO C100.

Regards,
Marta García-Granero

--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Juris Breidaks

How better can detect outliers...

In reply to this post by Lucas Bremer

Hello everybody,

I'm trying to detected outliers in my data. I have data series from 6 month till 24 month. Maybe, can you recomend some method which helps better detected outliers. Later this data is used for regression estimation.

Thanks in advance. Bye,Juris

Fred Weigel-2

Re: How better can detect outliers...

Juris,

I start by stating that I am only in my second semester of my doctoral program and in my second statistics course, so if you take my comments, you may want to confirm them (although I am sure someone on the list will correct me, too). My information is based on simple regression.

First, if you have outliers with a small n, you have a serious problem because the effects of the outliers are magnified. If you have a large n with few outliers, you may not have a problem at all. The important thing is that you don't want to just remove the outliers without careful analysis - this can be "ethically" wrong and academically dishonest, besides giving an inaccurate representation of your data. If, however, you realize the outliers are due to errors in data entry, misunderstanding of a survey question, or other recording errors, you may be grounds for removing them. You don't want to "take lightly" the removal of outliers from your model.

You will have to run a regression to get the residuals for these checks and semistudentized or deleted semistudentized residuals will be easier to understand because they give you a zero baseline and your deviations are measured in standard deviations. There are differences of opinion, but a rough rule of thumb is that if you have residuals that have an absolute value of 4 or more (I've seen 3), they are considered outliers.

Graphical methods (these work primarily for simple regression - if you multiple predictor variables, this gets harder to interpret):

If you have a large n, you can use a box plot of the residuals, a histogram of the residuals, you can compare the actual frequencies of the residuals with expected frequencies, or you can do a normal probability plot of the residuals. The normal probability plot is the only one that will give you reliable results for a small n.

The EXPLORE command in SPSS will give you boxplots that indicate your outliers with an O and if you have any extreme outliers, they'll be shown with an asterisk *.

Keep in mind that if you are doing regression, you have other things to check, too, such as linearity, homoscedasticity, independency of the error terms, have some of the important predictor variables been omitted from the regression model, and normality. Do the normality check last because the corrections for other deviations can have large effects on normality. Non-normality can also have serious effects on your results, so you don't want to overlook this.

There is more you can do, but that will get you started and I have to go to class.

I hope this helps,

Fred

All the best,

Fred Weigel

Doctoral Student
[hidden email]

College of Business

427 Lowder Business Building
415 West Magnolia Avenue

Auburn University
Auburn, AL 36849
Phone: 334-844-6538
Fax: 334-844-5159

>>> Juris Breidaks <[hidden email]> 2/19/2009 03:14 >>>

Hello everybody,

I'm trying to detected outliers in my data. I have data series from 6 month till 24 month. Maybe, can you recomend some method which helps better detected outliers. Later this data is used for regression estimation.

Thanks in advance. Bye,Juris