I always try to keep one fun project for myself that consumes around 10% of my time. Over the years I have dabbled in many different software development projects, and occasionally I hit on something that is really useful and worthwhile, while a good percent of the time I am just having fun. 🙂 But hopefully this project might be useful to some other people too! Already I am getting requests from people who want to hear a specific dialect, and I have lots of volunteers helping me find speakers. This is going to be fun!
This project involves getting people from various parts of China to speak some sentences in their dialect. While Chinese has always used the same character set, allowing people from different places to communicate with each other in written language, the diversity of spoken language is astounding with literally hundreds of different dialects. Now the word “dialect” is often used rather than “language” mostly because the use of the same characters, and perhaps also there are some political reasons too (ex. we are all united as one people through language philosophy).
Starting sometime in the 1950’s of so an effort was made to begin standardizing the spoken language. A big conference was held, and they voted on which dialect would become the standard. Han Chinese won, but Sichuanese was a very close second, off by just a couple of votes. Ever since then, Han Chinese is required to be taught in all schools as the primary spoken language. At the same time, people still learn and speak their dialects at home and in their community, but ever so slowly, many dialects are fading.
Hence I started this project to collect some people’s voices, and perhaps in a future time (really we can just about do it now) we can use artificial intelligence to create a chat bot that can speak using the dialect, and thereby preserve the language. Right now, I am just collecting some voice samples. For example, you can can compare 21 sentences spoken in Hunan dialect to the same sentences spoken in Chaoshan Dialect, or even in Cantonese. Maybe you can tell which one sounds more like Cantonese?