Why is it so hard to type in Indigenous languages?
When it comes to digital access and internet technologies, some languages are still more equal than others. Speakers of majority languages, who type in English or text in Korean, assume their message will be transmitted accurately. But Indigenous language communities don’t share this same confidence.
Computers and smartphones don’t come with the ability to type all letters in all languages. The unique characters integral to many Indigenous languages are often mangled as they travel across the ether.
However, the inclusion of two capital letters needed to write Haíɫzaqvḷa in a recent update of the Unicode Standard means this Indigenous language can finally be written and read on all digital platforms.
Why did it take so long? And what challenges do Indigenous communities face when wanting to type in their languages?
Haíɫzaqv: “to act and speak correctly as a human being”
Haíɫzaqvḷa is the language of the Heiltsuk (Haíɫzaqv) Nation whose traditional homeland is Bella Bella, British Columbia. The language has had its own orthography — an agreed written form with established spelling conventions — since the 1970s.
Working in partnership with native speakers, a Dutch linguist was invited by tribal leadership to document their increasingly endangered language and develop learning resources. The results of this collaborative work included an alphabet chart, storybooks and a dictionary.
Before the advent of digital technologies, Indigenous communities used specially modified typewriters to represent their languages in print. Customized typewriters designed to support the Latin, Syllabics and Cherokee scripts allowed users to publish in Indigenous languages like Haíɫzaqvḷa.
Two new letters were added to the Unicode Standard to write Haíɫzaqvḷa.
The digital divide
The digital age has created many opportunities and some new challenges. The American Standard Code for Information Interchange, the first computer text encoding standard, introduced in the early 1990s, did not support 44 of the 129 letters in the Haíɫzaqvḷa orthography. Special fonts and keyboards were required to render these characters on early desktop computers.
Designers around the world produced countless fonts to support typing in digitally under-resourced languages, each using a unique font-keyboard pairing to encode a specific language.
But this system had a major weakness: when files using custom fonts were shared, both the creator and the recipient needed to have the same font installed on their device. And if a recipient wanted to send a reply, they would need a keyboard input system that paired with that same custom font. Without these elements in place, the missing characters would be shown as “tofu,” or worse yet, rendered as a random string of meaningless characters.
The Unicode Standard’s goal is to represent all characters required by all of the world’s languages and writing systems in digital form. Unicode now defines 154,998 characters covering 168 scripts and has fast become the chosen standard for digital character encoding. Yet, until version 16.0 of the standard, released in September 2024, two capital letters needed to write Haíɫzaqvḷa remained absent.
Encoding Haíɫzaqvḷa
Through a partnership between Heiltsuk Revitalization, the University of British Columbia and international type design company, Typotheque, we have been working to ensure that each and every Haíɫzaqvḷa character is consistently represented and accurately reproduced on all digital platforms and devices.
Before this community-led collaboration, it was not possible to fully encode Haíɫzaqvḷa in digital text. This meant that community members couldn’t access the full range of characters they needed to input their language digitally. That would be like typing English without having access to capital E or S, and relying on workarounds like Σ for E or ∫ for S.
A stop sign in Haíɫzaqvḷa outside the Wáglísla Band Store in Bella Bella, B.C., on Nov. 8, 2024.
(Sara Shneiderman)
Ensuring accurate character encoding that is predictable on all operating systems is a cornerstone of language justice. Yet the burden is still on communities to petition Unicode to have their scripts included, and the process is exacting.
Harder still, a proposal must consider whether other languages that use the same script might be impacted by the proposed additions, and then mitigate and navigate potential conflicts. The stakes are high for changes to the encoding standard: decisions are almost impossible to reverse on account of the need to maintain stability and ensure both backward and forward compatibility.
Important projects like the Script Encoding Initiative have for decades been helping communities to prepare technical proposals for the encoding of scripts and characters that are as of yet not supported by Unicode. There is still much work to be done.
Language rights and government documents
’Cúagilákv — also known as Jess H̓áust̓i — is a Haíɫzaqv leader, parent, educator and poet from Bella Bella. In 2021, H̓áust̓i approached Canadian government agencies, both provincial and federal, to change Haíɫzaqv identification documents to remove colonial anglicizations and reclaim the correct spelling of their name.
H̓áust̓i was informed that the existing backend systems were unable to accommodate the representation of diacritic marks.
“The reason why I have an incorrect name is because it was anglicized by Indian agents. I didn’t create the problem, but I’m not getting any help to fix that,” H̓áust̓i told CBC News in 2021. “I feel that it’s important to honour my ancestors and my language by spelling and pronouncing it correctly. I would love for my children to grow up with the correct spelling of their name on their ID.”
The ability to fully encode Haíɫzaqvḷa in the Unicode Standard means the language can now be successfully input into any Unicode compliant system. This is a baseline requirement for the elimination of many remaining digital language barriers.
Haíłzaqv Chiefs (from left to right) Frank Brown, Ian Reid, Gary Housty, Carrie Easterbrook and Crystal Woods at the opening of the Haíłzaqv Language Building on Nov. 8, 2024, in Bella Bella, B.C.
(Sara Shneiderman)
Beyond bilingualism
Canada is fond of celebrating its commitment to bilingualism. Extensive provisions are in place to support English and French. But the origins of these colonial languages lie in Europe, brought by settlers as they first traded and then colonized; and both have vibrant speech communities in their original homelands and around the globe.
In 2019, the Canadian government passed the Indigenous Languages Act designed to support the revitalization, maintaining and strengthening of the languages Indigenous to this land.
As Canada works to implement the United Nations Declaration on the Rights of Indigenous Peoples, it should also simultaneously realize the slogan of the Unicode Consortium: “everyone in the world should be able to use their own language on phones and computers.”
The challenges to achieving universal encoding for historically-marginalized languages are no longer technical; they are bureaucratic and political. In 2009, Canada’s then Commissioner of Official Languages, Graham Fraser, was quoted as saying:
“In the same way that race is at the core of … American experience and class is at the core of British experience, I think that language is at the core of Canadian experience.”
Through ensuring linguistic justice for all of its citizens, Canada can exercise global leadership in language policy and planning.
This article was co-authored by Bridget Chase, a language technologist and researcher, and Kevin King, a typeface designer at Typotheque. Läs mer…