啊天堂: A Tian's Language Blog

Custom Chinese Frequency Dictionaries for Pleco/Web

I created a script that lets you generate a custom frequency dictionary for use with Pleco or a webpage I wrote. This is what it looks like while using it:

Pleco version

啊天堂: Attenius's Language Blog

In this example, I tapped on the word 麻雀 while reading a book, then switched to the custom dictionary which shows how common it is in several books/subtitle files I provided to my script.

It shows that 麻雀 appears 5+ times in the entry "HZ" (活着), 2-4 times in "WitchDollVow" (a web novel I plan to read), and 1 time in "WILL" (a visual novel I'm playing).

Based on this information, I can decide whether 麻雀 is a word worth learning/sentence mining. In this case, since it shows up a bunch in one book, but not very much in other content I want to consume, I think I would choose to not learn it for now. I'll either learn it just by immersing in 活着 or maybe in the future when consuming other content.

Web version

啊天堂: Attenius's Language Blog

In this example, 牛 shows up 5+ times in the yellow entries, 2-4 times in the green entries, and 1 time in the gray entries.

Instructions and explanation here:

https://github.com/Attenius/frequency_dictionary_viewer/

You can run the script using the Google Colab notebook linked there.

You can use dictionaries with this site:

https://attenius.github.io/frequency_dictionary_viewer/

In the future, I will provide a couple files people can use without having to generate their own, containing entries for popular learner content like 活着 and Peppa Pig.

All of this has only been tested with simplified characters and only with Mandarin.