Skip to content

A way to deal with Schei* encoding - deal with "Non-ISO extended-ASCII"

We had, again, some issues with encoding.
*file* returns an output like "Non-ISO extended-ASCII". This time, I created a basic step sequence here.
At the end, it really is an brute force approach. And we are using heavily a lot of open source software (thanks again duds!). Furthermore, the sequence steps are based on this post from superuser.com.

# create a list with supported encodings
iconv --list | sed 's/\/\/$//' | sort > list_with_supported_encodings.txt
# iterate over the list of know encodings and try to encode the file with it
LOCAL_SUPPORTED_ENCODING_FILE_PATH='list_with_supported_encodings.txt'
LOCAL_RESULT_FILE_PATH='result.txt'

for LOCAL_ENCODING in `cat $LOCAL_SUPPORTED_ENCODING_FILE_PATH`; do
    printf "$LOCAL_ENCODING  "
    iconv -f $LOCAL_ENCODING -t UTF-8 2016-02-08_UPLOAD_CSV.csv.stev > /dev/null 2>&1 && echo "ok: $LOCAL_ENCODING" || echo "fail: $LOCAL_ENCODING"
# uncomment line below if you want to see the result and put it into the file
#done | tee $LOCAL_RESULT_FILE_PATH
# put the output into the file
done | cat > $LOCAL_RESULT_FILE_PATH
# filter only the successful tryouts
LOCAL_RESULT_FILE_PATH='result.txt'

cat $LOCAL_RESULT_FILE_PATH | grep 'ok:' > 'only_ok_'$LOCAL_RESULT_FILE_PATH
Now comes the hard work, you have to give it a try for each "ok" result in the fitting file.
# read the result file with the ok content and create a encoded version of your broken file
LOCAL_BROKEN_FILE_PATH='relative/or/full/qualified/file/name.txt'
LOCAL_RESULT_FILE_PATH='only_ok_result.txt'

# sed -e 's/^\(.*\)\ \ ok\(.*\)/\1/p' means
# remove any kind of content starting with '  ok:' on each line
# assmed a line looks like "S2  ok: WS2", the result will look like "WS2"

for LOCAL_ENCODING in `cat $LOCAL_RESULT_FILE_PATH | grep ok | sed -e 's/^\(.*\)\ \ ok\(.*\)/\1/p' | uniq; do
    LOCAL_CONVERTED_FILE_PATH=$LOCAL_ENCODING'_'$LOCAL_BROKEN_FILE_PATH
    #echo $LOCAL_CONVERTED_FILE_PATH
    iconv -f CP850 -t UTF-8 $LOCAL_BROKEN_FILE_PATH > $LOCAL_CONVERTED_FILE_PATH
done
Open each file and check if your fitting special characters are looking good. "WINDOWS-1258" and "CP850" are good blind guesses here.

An unfinished review about the book "Patterns, Principles, and Practices of Domain-Driven Design" by Scott Millett

To put my current status in one sentence would end in something like "Still not finished but already learned and achieved so much".
This entry is about the book named "Patterns, Principles, and Practices of Domain-Driven Design" by Scott Millett.

First of all, thank you Scott Millett.

I started reading this book at the end of 2015 and I am right now on chapter eleven. It is not because of the complexity of this book. It is because of the essential knowledge shared in each sentence (ok, maybe only each paragraph ;-)).
My approach right now is to read a page and practice it right away, either in the company at all, in the team or in the code.
Since Domain Driven Design is quite close to normal behavior and life, I always run into open arms when explaining somebody an idea, either a part of the qa, the developers or the business stuff.
It is also cool that Scott Millett tells you more than once, Domain Driven Design is not the silver bullet.

As written above, I am far away from having finished this book, but even now (ore even few chapters before) I would have signed the sentence "totally worth the money".

Last but not least, thank you Scott Millett.