Mapping ids (Concepticon, WOLD, NELex)

Post a reply


This question is a means of preventing automated form submissions by spambots.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :arrow: :| :mrgreen: :geek: :ugeek:

BBCode is ON
[img] is ON
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: Mapping ids (Concepticon, WOLD, NELex)

Re: Mapping ids (Concepticon, WOLD, NELex)

by seweli » 2025-01-13 06:21

Fantastic!

So the master file is not named "master" anymore, but is named id_map, which is clearer, with 3655 rows of concepts, and an id key column entirely filed, mostly linked in first place to the second column, PWN_id (for Princeton WordNet) 👍

And this id_map table doesn't contain anymore the English definitions, which is a good idea, because it's much more readable, and potentially neutral, even if I will need to connect a second screen to see them on parallel 😜

And there is a Conception file for French 🥖 Very useful for me.

So good work, thanks! It's the moment to say the English expression "You make my day"!

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2025-01-11 18:44

Concept id mapping is now in data/id_map.tsv.

Today I found out that also Concepticon includes translations, see the files in the mappings folder. :o

Re: Mapping ids (Concepticon, WOLD, NELex)

by seweli » 2024-10-16 21:31

Perfect.

I can edit the file .tsv easily with Google Sheets, and you said on Discord it works with Libre Office too.

It works also with Excel. I only have to open the file in an editor, select all, copy, and paste in Excel, after having set all the column to text format.

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2024-10-13 20:12

I created a new workspace file for creating final concept ids. The file is data/master.tsv and it includes final ids, initial ids, meaning definitions from Concepticon and ULD and ids from Concepticon, NELex, ULD and WOLD.

ULD (Universal Language Dictionary) is a new resource in this project. It was created by Rick (Rich) Harrison, who was a notable person in the auxiliary language scene in the 1990s. An example of this list is at Lingwa de Planeta's website. It includes translations to Esperanto, Novial, Lingwa de Planeta and Sambahsa. ULD is a logical tool for Panlexia to reach out to the auxiliary language scene. By the inclusion of this data, Panlexia has now first translations to constructed languages. :)

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2024-10-10 12:32

I did it and uploaded the file to data/WOLD/forms_fra_spa_deu_rus.tsv.

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2024-10-10 11:48

I downloaded the WOLD data from https://wold.clld.org/download. I think that it's the same data that is present in the website.

Now I downloaded also the zip file that you mentioned. Unfortunately translation.csv doesn't include WOLD ids. How silly! :D But the numbers in meaning_pk column match the numbers in pk column in parameter.csv file, which includes WOLD id column. So with a little programming we can now get translations for all WOLD words in French, Spanish, German and Russian.

danke! :D

Re: Mapping ids (Concepticon, WOLD, NELex)

by seweli » 2024-10-09 21:05

Thanks for the explanation.

In
https://wold.clld.org/meaning
we see clearly
LWT code Meaning
1.1 the world
1.21 the land
etc.

If we click on Meaning 1.1
https://wold.clld.org/meaning/1-1#2/24.3/-4.8
we see
1 Swahili dunia
1 Swahili ulimwengu
2 Iraqw yaamu
3 Gawwada ʔalame
4 Hausa dúuníyàa
etc.

And we found the same information in your files
https://github.com/barumau/panlexia/blo ... /forms.csv
https://github.com/barumau/panlexia/blo ... guages.csv
Image

What I don't understand, is where did you find your files.
Because I didn't find them in
https://github.com/clld/wold2/blob/master/data/data.zip

Furthemore, I remark that in the zip, there is a file translation.csv, with the words only for French, Spanish, German, Russian, and that they are not in your file form.csv

Re: Mapping ids (Concepticon, WOLD, NELex)

by seweli » 2024-10-08 21:27

Thanks a lot. I begin to understand. I need to see the file that contains the Swahili translations to connect my neurones.

Just to say: during the night, for a human, there's rarely something more precious than sleeping. No databasing at night. Take care of yourself ✊

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2024-10-08 17:14

seweli wrote: 2024-10-07 22:40 It remains a big question: how to get the files of the WOLD translations, and how to use it.
WOLD translations are in data/WOLD/forms.csv. Four first columns are ID, Language_ID, Parameter_ID, Form. Language_ID identifies languages according to data/WOLD/languages.csv. For example 1 is Swahili and 123 is French. Parameter_ID identifies the concept according to data/WOLD/parameters.csv. For example 1-1 means 'the world', 1-21 means 'the land' and so on.

At first I will sort that file according to language id, so that all translations for each language form a continuous section. Then it will be easier to extract all words for each language at once, first Swahili, then Iraqw, etc. I will write a program that iterates through all languages one by one, extracts words for each language and writes them to a file, for example swh.tsv and irk.tsv. Each file will be made up of rows that consist of a Panlexia concept id, a tab and the word for the concept in that language. The program will be a little longer and a little more complex than the Python programs that I have uploaded so far. It's maybe one night of work for me.

I can use the same program with small modifications for extracting dictionaries from NorthEuraLex. The data file (northeuralex-0.9-forms.tsv) is already organized so that each language forms one section in the file. The file includes also IPA transcriptions, which will be written to a separate file.

Re: Mapping ids (Concepticon, WOLD, NELex)

by pandunia-guru » 2024-10-08 16:47

seweli wrote: 2024-10-07 23:21 It seems you map "property" to "A".
...
You may keep the WOLD column "Property Category" when WOLD is available, since it distinguishes Adjective and Adverb. Or not?
That program reads only Concepticon data, so it can access only the "property" parameter. I should write more code to get the "Adverb" parameter from WOLD. I can do it, though.

Top