Tuesday, May 1, 2018

Corrupting characters - How to get invalid byte values stored in strings

Having worked with Database Migration Assistant for Unicode (DMU) to convert some databases from single-byte charactersets to AL32UTF8, I had problems with DMU reporting a lot of characters with invalid byte values (in this case binary values that did not exist in WE8ISO8859P15.)

So how can that happen? Doesn't the database enforce character encoding to match the database characterset?

Wrong - not always.

Well ok, you can get invalid values with single-byte charactersets - but once the database is AL32UTF8 then it can store all the characters in the world, so then it cannot happen, right?

Wrong again - you can still get corrupt character data if you do it the wrong way.