пятница, 23 сентября 2011 г.

I bought a new Amazon Kindle 3g and thought I could read some Turkish clips in order to learn this language. But there are so many verbal forms and cases for both nouns and adjectives,  one could possibly say - as many as in our Russian tongue - and I soon found it really hard. Then I tried to search for inline Kindle Turkish-to-English dictionary in the net. There are some "online" ones and there are some "Windows" ones, but I was unhappy not to find a good mobi formatted one. I tried to make one then, and after some days - I succeded. 

First of all, I downloaded a Babylon setup and extracted the much needed BLG file. Then I used UnpackBLG.exe in wine to extract a TXT file - option Terms - Outside. This TXT was processed by two Perl scripts, one (A) removing empty lines and tab-delimiting them, and correctly stuffing UTF symbols instead of the bad ones from UnpackBLG.exe output, the other (B) one splitting Pipe-delimited verbal forms and cases into separate lines:


A:

$line1=~ s/\n/\t/g if ($line1 =~ /(.+)\n/);
$line1=~ s/þ/ÅŸ/g;#s cedilla, shekure
$line1=~ s/ý/ı/g;#i grave
$line1=~ s/ü/ü/g;#u dotted
$line1=~ s/ö/ö/g;#o dotted
$line1=~ s/ð/ÄŸ/g;#g capped
$line1=~ s/â/â/g;#a capped
$line1=~ s/ç/ç/g;#c cedilla, cocuk
$line1=~ s/Þ/ÅŸ/g;#s cedilla, shekure
$line1=~ s/Ý/Ä°/g;#i grave
$line1=~ s/Ü/Ãœ/g;#u dotted
$line1=~ s/Ö/Ö/g;#o dotted
$line1=~ s/Ð/ÄŸ/g;#g capped
$line1=~ s/Â/â/g;#a capped
$line1=~ s/Ç/Ç/g;#c cedilla, cocuk
$line1=~ s/é/é/g;#French e

B:

if ($line1 =~ /(.+) \| (.+)/) {
    my ($pipes, $meaning) = split('    ', $line1);
    my @list = split(' \| ',$pipes);

    my $counter = 0;
    foreach my $line (@list) {
        my $str = '';
        if(!$counter) {
            $str = $line.'    '.$meaning."\n";
        } else {
            $str = $line.'    '.$list[0]."\n";
        }
        $counter++;
        print OUTF $str unless ($str =~ /\//);
    }
} else {

    print OUTF $line1;
}

I manually removed a couple of trash lines that were infused by UnpackBLG, those that tab2opf.py couldn't swallow:

python tab2opf.py -utf turk.txt 2>&1>err

At last, I got an .opf file and 138x2Mb XML files to feed to mobigen:

wine mobigen/mobigen.exe turk.opf -unicode

The desktop 2-core, it took an hour rattling HDD and my fears of losing it, but finally - it worked, thank God.

Here is the 40MB zipped file - http://82.146.44.218/turk.zip


Put it directly to Kindle docs directory via USB (no conversion requred). Then choose it as primary dictionary for Turkish texts. You would usually need push the button twice - first, reducing form to infinitive, second, getting the meaning of it,  - it's the most handy way, I think.

By the way, you know how to clip web sites to your Kindle? Look where I clip Turkish newspapers for reading on my Kindle.