Generating bilingual dictionaries

Discussion about the Panlexia project
tok a tema of de panlexia projet
pandunia-guru
Posts: 62
Joined: 2024-09-28 09:28

Generating bilingual dictionaries

Post by pandunia-guru »

I wrote a Python program for writing bilingual dictionaries called write_bilingual_dict.py. It is run from the command line like this:
python3 src/write_bilingual_dict.py <source language code> <target language code>

A practical example:
python3 src/write_bilingual_dict.py pandunia eng

With this command the program will read the files dict/P/pandunia.tsv and dict/E/eng.tsv. Then it combines together those rows that have the same concept id in the source language file and the target language file. Finally it writes a bilingual dictionary in the file generated/pandunia-eng.md, which is a Markdown file. The file will look a below.

Code: Select all

## A

**argente** *n* silver
**aur** *n* gold

## B

**barka** *v* bless
**bina** *v* build
**brad** *n* brother
It would look like this in a website:
Markdown_formatter wrote: A

argente n silver
aur n gold

B

barka v bless
bina v build
brad n brother
It doesn't look fancy but it does the job. Let's improve the program later to generate better looking dictionaries with more information about things like style, synonyms, morphology and etymology.
pandunia-guru
Posts: 62
Joined: 2024-09-28 09:28

Re: Generating bilingual dictionaries

Post by pandunia-guru »

Today I added a shell script for creating all bilingual dictionaries for a language at once. Execute it in Linux (native or WSL) by typing for example:
sh generate_bilingual_dictionaries.sh pandunia
And it will populate the generated/ folder with lots of simple but nice dictionaries. :)
seweli
Posts: 50
Joined: 2024-09-29 20:49

Re: Generating bilingual dictionaries

Post by seweli »

So it's in language Bash and not in Python 😮
pandunia-guru
Posts: 62
Joined: 2024-09-28 09:28

Re: Generating bilingual dictionaries

Post by pandunia-guru »

seweli wrote: 2024-11-09 16:14 So it's in language Bash and not in Python 😮
The Bash script is more like a series of commands. It would be tedious to write the same commands one by one again and again. It's elementary programming that depends on the commands that the operating system and environment offer.

By the way, I generated dictionaries for Pandunia and uploaded them here. Please don't share the link forward, because it's now only a very small demonstration. However, you can already see that there are lots of new dictionaries that didn't exist for Pandunia before and that for example the Mandarin dictionary has romanization in Pinyin and the Telugu dictionary has pronunciation in IPA.

I will announce these dictionaries when they are bigger. We have to create more concept ids and I have to add more of them to Pandunia's dictionary to get more words in.
seweli
Posts: 50
Joined: 2024-09-29 20:49

Re: Generating bilingual dictionaries

Post by seweli »

It works!

Image
pandunia-guru
Posts: 62
Joined: 2024-09-28 09:28

Re: Generating bilingual dictionaries

Post by pandunia-guru »

danke! :D

By the way, your screenshot shows bad alphabetical sorting. Upper-case "I" is before lower-case "b"! So I had to add a little code for case-insensitive sorting.

Code: Select all

# Sort the words in case-insensitive way.
sorted_dict = sorted(dict, key=lambda s: s[0].casefold())
Post Reply