Wednesday, May 16, 2007

Excel Sheet IMEX=1 and how text gets treated as numbers

I was surprised today to see how microsoft excel finds out the datatype of rows in a excel sheet.

Let us say you have an Excel sheet with 60 rows and the first column has first 20 rows as numbers and the next 40 rows as alphabets, some thing like,

UserIdentifier First name last name
10 John Smith
20 Steve Martin
....
....
....
ALAN Alan Smith

User identifier in this case is alphanumeric (can contain both numbers or alphabets)

If you import this sheet using ADO in ASP or in VB
excel will look at first 8 rows and decide for itself that the datatype of first column is number and hence will import null for Alan and all other non-numeric rows.

This is based on a registry setting,
The registry key where the settings described above are located is: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel

called TypeGuesses = 8

So it looks for first 8 rows to decide what is the data type and then fixes the datatype.

the property IMEX=1 means treat all cols as text but still it too does not work.

http://support.microsoft.com/kb/194124

This is explained in the KB above.

The only way to fix this is to increase the type guesses in teh registry to a higher value.

So if you have a excel sheet say with 3000 rows and first 1000 rows are numbers and next 2000 rows are alphabets, it will work fine only if typeguesses = 2001 otherwise it will import spaces for every alphabet.

This is totally crazy and Microsoft should have come up with some other better way of finding the datatype in a excel sheet.