Cyrillic Letters Rejected by Certain Fields
Posted
#1647
(In Topic #375)
So far I had problems with the "URL Moniker" field and the "Codename" field when I edit pages.
Letters like Р, М, Ч cause the fields to turn red and the form cannot be submitted.
Posted
Yes, these particular fields need to be ASCII, that's an intentional limitation. But they're not visible content fields so this should be okay. These fields basically form part of the URL which by convention is just ASCII characters (most web software assumes that).
Posted
I managed to trick Compo into accepting them and my site works just fine. All major browsers navigate through the Cyrillic URL's just fine and google.bg indexes my site better because of the more descriptive, keyword-rich page names.
Maybe you would like to consider lifting this limitation, or at least providing an Add-On, which removes it for those who desire to work with Cyrillic.
Posted
(I transliterated your name
You may know more than me about this in some aspects. My main concern is URL encoding, which is one of the original web standards, is not going to play nicely.
For example if you put your Cyrllic name into here it mangles:
Online urlencode() function - Online PHP functions
That's why we're normally using transliteration instead.
It may be okay for the main URL path, outside the GET parameters (?foo=bar&something=false kind of stuff.)
It may be that even for GET parameters some people do reduced URL encoding, only converting certain core symbols (?, &, = for example).
Do you have any further thoughts on it?
Posted
Posted
Chris Graham said
Maybe some web browsers show the URLs in the address bar without any unnecessary % encoding, as a way to make them appear nicer to the user, but behind the scenes it may still use them?
From “Post #1617”, 9th Jan 2017
Yeah, I think browsers are going out of the way to make sure non-latin characters look good in address bars. I type in utf-8 and it calls using %-encoding but still displays in clean utf-8:
However, when spaces are typed it turns those into %-encoding immediately:
The browser must be treating non-latin as a special case to make things nicer for you
So, maybe we can allow through all non-latin characters, possibly only if a config option is enabled to allow that.
Posted
Further testing showed me that even %-encoded URLs clicked will show non-latin characters nicely in the address bar. Then if I copy and paste it out of the address bar, e.g. to a text file, the %-encoding shows. So, I learnt something here that I was not aware of
It's a trade-off between having the URLs look nice in the address bar vs having them look nice in code. If we allow Cyrillic it will look nice in the address bar but very ugly in code. If we transliterate it will not look as nice in the address bar but it will look okay in code.
Having a good Bulgarian experience for you is important to us. I've made a new release for you to test where I've made many changes so please let me know how it is for you. You'll want to disable "Moniker transliteration" from Admin Zone > Setup > Configuration > Site options > SEO (because we're giving everyone the choice whether to use transliteration or to keep the Cyrillic). These changes will be in RC30 also when we release that.
Posted
Posted
Posted
2 guests and 0 members have recently viewed this.
