#3587 - Internationalised e-mail addresses and URLs

Identifier #3587
Issue type Feature request or suggestion
Title Internationalised e-mail addresses and URLs
Status Open
Tags

Type: Internationalisation (custom)

Handling member Deleted
Addon core
Description This is a complex topic.

Domain names may use any Unicode character via Punycode (aka IDN, basically). Domain names do not support utf-8 because by convention they map to hostnames, which are never going to support that.

E-mail addresses may use any Unicode character via Internationalized Email (it just uses any Unicode character set you want I believe, it's just more of a consensus to do things in a proper modern way).

URLs may have encoding that may involve a combination of ASCII, URL encoding, utf-8, and Punycode. Technically you're not allowed utf-8 in a URL, but it happens by people not doing encoding fully and can be interpreted non-ambiguously so is a reasonable thing.

So what do we need to do?

1) Our HarmlessURLCoder should convert Punycode to utf-8

2) Our HarmlessURLCoder should be used for when URLs are pasted in and we need link text but can't get a <title> from what's under the URL (i.e. we already show URLs for that link text, but without HarmlessURLCoder).

3) E-mail address santitisation server-side and client-side should be significantly loosened, IF a config option is enabled (maybe enabled by default?).
Steps to reproduce

Additional information There are a lot of concerns...

a) Punycode is intentionally crippled by browsers because it can lead to attacks. See https://wiki.mozilla.org/IDN_Display_Algorithm

b) I have concerns about non-ASCII e-mail address because allowing all kinds of symbolic characters and Unicode is likely to significantly increase the chance of typos that can't be detected.

c) I don't think Punycode or Internationalized email is in very common use. E.g. "Weibo" in Chinese is something transliterated such as weibo.com. I think people are used to this. I don't have data though. Realistically it is easier for the world if we all use latin (ASCII) identifiers for things, as they are easier to share and type. This may well just remain the predominant de-facto standard irregardless of the actual standards.



The best thing for now may be to do nothing until a practical concern comes up from someone actually affected.
Funded? No
The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".

Rating

Unrated