a useful improvement?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

a useful improvement?

Maguin, Eugene

I’m curious whether others would see this as a useful improvement to spss.

 

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

 

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

 

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

 

Are there situations where this improvement would be bad idea?

 

Gene Maguin

Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Jon K Peck
This would be a good idea - I don't see any downside although I don't know architecturally how hard it would be to implement, but right now you can use the STATS ADJUST WIDTHS extension command to make the string widths consistent across a set of sav files.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "Maguin, Eugene" <[hidden email]>
To:        [hidden email],
Date:        07/24/2013 07:44 AM
Subject:        [SPSSX-L] a useful improvement?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I’m curious whether others would see this as a useful improvement to spss.
 
The situation: Adding files with string variables whose widths differ but which have the same name.
Currently, Add files fails if the same-named string variable has different widths in different files.
The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.
 
It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.
 
The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.
 
Are there situations where this improvement would be bad idea?
 
Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Frans Marcelissen-5
In reply to this post by Maguin, Eugene
I fully agree with this. The problem with the string length is absolutely the worst feature of spss.
My current workaround is to alter the length of all string variables into a very high velue (alter type all (a=1000) add files ..... alter type all (a=amin) but I don't see any reason why the maximum length of the strings is not used).
Frans

2013/7/24 Maguin, Eugene <[hidden email]>

I’m curious whether others would see this as a useful improvement to spss.

 

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

 

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

 

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

 

Are there situations where this improvement would be bad idea?

 

Gene Maguin




--

 
-------------------
dr F.H.G. (Frans) Marcelissen
DigiPsy (www.DigiPsy.nl)
Pomperschans 26
5595 AV Leende
tel: 040 2065030/06 2325 06 53
skype adres: frans.marcelissen
email: [hidden email]
 
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Bruce Weaver
Administrator
In reply to this post by Maguin, Eugene
Great suggestion, Gene.  I can't immediately think of any situations where resetting all string variables to the maximum width would be a bad thing.


Maguin, Eugene wrote
I'm curious whether others would see this as a useful improvement to spss.

The situation: Adding files with string variables whose widths differ but which have the same name.
Currently, Add files fails if the same-named string variable has different widths in different files.
The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

The same issue comes up, I'm sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

Are there situations where this improvement would be bad idea?

Gene Maguin
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

David Marso
Administrator
In reply to this post by Maguin, Eugene
That would be very useful!
I proposed something of this sort about 15 years ago and in fact implemented a prototype in VB6 using the SPSSIO32.DLL .  Unfortunately TPTB didn't see it as a viable use of development resources ;-(

Maguin, Eugene wrote
I'm curious whether others would see this as a useful improvement to spss.

The situation: Adding files with string variables whose widths differ but which have the same name.
Currently, Add files fails if the same-named string variable has different widths in different files.
The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

The same issue comes up, I'm sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

Are there situations where this improvement would be bad idea?

Gene Maguin
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Andy W
In reply to this post by Maguin, Eugene
Ahh - I spoke to hastily - I was confusing this with not needing to sort the files for the match tables (not sure why I made that brain fart). Again, my workflow for such a situation is;

alter type ALL (A = A255). /*Just some string width I know will be sufficiently sized - do to both files.
*add files here.
alter type ALL (A = AMIN).

I made the mistake previously of saying

alter type (ALL = AMIN).

where I should have said;

alter type ALL (A = AMIN).
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Art Kendall
In reply to this post by Maguin, Eugene
It is certainly a great idea.
Match files, etc create a map  for the output file.
It details what string variables are incompatible because of their length.
For variables for which no incompatibilities are found, nothing needs to be done.
Since the lengths of the input strings are known, it seems straight forward to set the output variable to the max of the lengths
Art Kendall
Social Research Consultants
On 7/24/2013 9:46 AM, Maguin, Eugene [via SPSSX Discussion] wrote:

I’m curious whether others would see this as a useful improvement to spss.

 

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

 

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

 

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

 

Are there situations where this improvement would be bad idea?

 

Gene Maguin




If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/a-useful-improvement-tp5721327.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Jon K Peck
In order to set the length, ADD/MATCH would need to know all the sizes before starting, which means opening all up to 50 files and possibly altering the size in the active file before starting the matching and then adjusting the data as needed.  Conceptually simple, but there's a lot of code in the way.  Since ADD/MATCH are transformations, the whole transformation system would need to get involved.

I'm sure it could be done, but I wouldn't assume that it is straightforward.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email],
Date:        07/24/2013 10:59 AM
Subject:        Re: [SPSSX-L] a useful improvement?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




It is certainly a great idea.
Match files, etc create a map  for the output file.
It details what string variables are incompatible because of their length.
For variables for which no incompatibilities are found, nothing needs to be done.
Since the lengths of the input strings are known, it seems straight forward to set the output variable to the max of the lengths

Art Kendall
Social Research Consultants

On 7/24/2013 9:46 AM, Maguin, Eugene [via SPSSX Discussion] wrote:

I’m curious whether others would see this as a useful improvement to spss.

 

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

 

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

 

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

 

Are there situations where this improvement would be bad idea?

 

Gene Maguin





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/a-useful-improvement-tp5721327.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: a useful improvement?
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Art Kendall

MATCH FILES needs to open the files to get the info for the map.

Does it need to adjust the data in the input file? or just in the output file?
Does it create the output file before it has collected all of the info from the input files?
Art Kendall
Social Research Consultants
On 7/24/2013 1:10 PM, Jon K Peck wrote:
In order to set the length, ADD/MATCH would need to know all the sizes before starting, which means opening all up to 50 files and possibly altering the size in the active file before starting the matching and then adjusting the data as needed. � Conceptually simple, but there's a lot of code in the way. � Since ADD/MATCH are transformations, the whole transformation system would need to get involved.

I'm sure it could be done, but I wouldn't assume that it is straightforward.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From: � � � � Art Kendall [hidden email]
To: � � � � [hidden email],
Date: � � � � 07/24/2013 10:59 AM
Subject: � � � � Re: [SPSSX-L] a useful improvement?
Sent by: � � � � "SPSSX(r) Discussion" [hidden email]




It is certainly a great idea.
Match files, etc create a map � for the output file.
It details what string variables are incompatible because of their length.
For variables for which no incompatibilities are found, nothing needs to be done.
Since the lengths of the input strings are known, it seems straight forward to set the output variable to the max of the lengths

Art Kendall
Social Research Consultants

On 7/24/2013 9:46 AM, Maguin, Eugene [via SPSSX Discussion] wrote:

I’m curious whether others would see this as a useful improvement to spss.

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

Are there situations where this improvement would be bad idea?

Gene Maguin





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/a-useful-improvement-tp5721327.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: a useful improvement?
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Richard Ristow
In reply to this post by Maguin, Eugene
At 09:42 AM 7/24/2013, Maguin, Eugene wrote:

>I'm curious whether others would see this as a useful improvement to spss.
>
>The situation: Adding files with string variables whose widths
>differ but which have the same name.
>Currently, Add files fails if the same-named string variable has
>different widths in different files.
>The improvement: The width of a string variable in the resultant
>file of an add files command is defined to be the maximum width of
>that variable encountered across the files being added.

Sigh. You wouldn't believe the number of times this has been
suggested over the years -- I've long lost count of how often I've
suggested it. It's particularly maddening because, if you import data
from Excel files, the resulting width of string variables depends on
the values in the particular file. Import multiple datasets on the
same topic (with the same set of variables), and you're almost
guaranteed the resulting datasets won't be compatible for ADD FILES.

At 04:45 PM 12/7/2012, Jon K Peck wrote, in thread "import and merge
multiple Excel files":

>You make a good point about possible string width variation.  If
>there are string variables and [they do not all come in the same
>length], the solution is to add an ALTER TYPE command to the syntax
>that sets the widths as needed before running the ADD FILES command.
>
>As for ADD FILES, no doubt it would be nice to be more tolerant of
>width variations.  The problem is that the variable definitions have
>to be established when the variable is first encountered, and string
>widths are immutable except for what ALTER TYPE can do.  The new
>STAR JOIN command tolerates variation in key widths, but it is not
>appropriate for this application.

OK, if that's how it has to be; but even an option to take the
*first* width encountered would be an improvement. In the first
place, you'd get *some* data, not a totally failed run (it'd be fine
to issue a warning if when string variables are truncated); in the
second place, you could start the ADD FILES with a one-record dataset
where all the strings have the maximum desired length.

One problem with the ALTER TYPE solution is it's meta-programming:
the code depends on the data dictionary. OK, Python solves everything
... but, goodness, a straightforward native-mode solution would be nice.

At 10:09 AM 7/24/2013, Andy W wrote:
>Good news! I believe this is implemented in the newest release (V22).

Well, if that's so, I'm glad. But I believe ADD FILES was added when
SPSS changed to (then called) SPSSX, which means 21 releases to fix a
glitch that's been annoying from the beginning.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: a useful improvement?

Art Kendall
In reply to this post by Jon K Peck
If I have 50 files I want to merge, why would SPSS need to alter the length of the input variables?� Why is is not possible to just change the length of a target variable in a new dataset?
string newvar(a50) oldvar(a25)
COMPUTE newvar = oldvar.
is a legitimate transformation.
Art Kendall
Social Research Consultants
On 7/24/2013 1:10 PM, Jon K Peck wrote:
In order to set the length, ADD/MATCH would need to know all the sizes before starting, which means opening all up to 50 files and possibly altering the size in the active file before starting the matching and then adjusting the data as needed. � Conceptually simple, but there's a lot of code in the way. � Since ADD/MATCH are transformations, the whole transformation system would need to get involved.

I'm sure it could be done, but I wouldn't assume that it is straightforward.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From: � � � � Art Kendall [hidden email]
To: � � � � [hidden email],
Date: � � � � 07/24/2013 10:59 AM
Subject: � � � � Re: [SPSSX-L] a useful improvement?
Sent by: � � � � "SPSSX(r) Discussion" [hidden email]




It is certainly a great idea.
Match files, etc create a map � for the output file.
It details what string variables are incompatible because of their length.
For variables for which no incompatibilities are found, nothing needs to be done.
Since the lengths of the input strings are known, it seems straight forward to set the output variable to the max of the lengths

Art Kendall
Social Research Consultants

On 7/24/2013 9:46 AM, Maguin, Eugene [via SPSSX Discussion] wrote:

I’m curious whether others would see this as a useful improvement to spss.

The situation: Adding files with string variables whose widths differ but which have the same name.

Currently, Add files fails if the same-named string variable has different widths in different files.

The improvement: The width of a string variable in the resultant file of an add files command is defined to be the maximum width of that variable encountered across the files being added. And, a message is printed in the log section noting that the width of that variable differed across the added files and it was reset to the maximum value encountered value.

It seems that currently this has to be done by hand, as it were, by opening the files, looking at either data->variables or display variables and then using alter type and resaving.

The same issue comes up, I’m sure, in a match files operation or an update operation and, perhaps, the same improvement might also be used.

Are there situations where this improvement would be bad idea?

Gene Maguin





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/a-useful-improvement-tp5721327.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: a useful improvement?
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants