Can URL encoding be disabled? [closed]

Hi! Is there a way to disable URL encoding for non-ASCII (UTF-8) characters? Once saving the HTML file the links turn into URL encoding format and IE is unable to read them. Thnx for help!

Example:

Prior editing and saving in LibreOffice:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<meta http-equiv="content-type" content="text/html; charset=windows-1250">
<body lang="cs-CZ" text="#000000" dir="ltr" style="background: transparent">
<p><a href="Íii.odt" target="_blank">Íii</a></p>
<p>
</p>
</body>
</html>


After editing and saving in LibreOffice:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<meta http-equiv="content-type" content="text/html; charset=windows-1250">
<title></title>
<meta name="created" content="0;0">
<meta name="changed" content="20140311;164828687000000">
<style type="text/css">
<!--
p { color: #000000 }
-->
</style>
<body lang="cs-CZ" text="#000000" dir="ltr" style="background: transparent">
<p><br>
</p>
</body>
</html>

edit retag reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp close date 2016-02-20 07:30:30.921183

Sort by » oldest newest most voted

The underlying problem seems to be charset=windows-1250, IE seems to stumble over the URL being encoded in UTF-8 while the meta tag says it would be windows-1250. If you could make that use charset=utf-8 instead things might work.

more

Well the charset change doesn't help. It is because the IE doesn't support utf-8 format of url at all. Check here http://support.microsoft.com/kb/941052

( 2014-03-12 15:13:24 +0200 )edit

The original example is using Windows 1250 encoding (not UTF-8) i.e., Í is encoded as 013a (hex) / 229 (decimal), whereas in UTF-8 this would be U+00cd. LO is converting the Windows 1250 character to percent encoding, as per the W3C (RFC standard) recomendation. @erAck is correct in that with charset=utf-8 I can save the original HTML under v4.1.4.2 and obtain <A HREF="Íii.odt" TARGET="_blank">Íii</A> which is valid UTF-8 encoding. If your host operating system is set to use Windows 1250 encoding you will have a larger issue.

( 2014-03-13 00:27:48 +0200 )edit

Well strange. I of course changed the charset and tried but it is not working. I filled a bug report so we will see. https://bugs.freedesktop.org/show_bug.cgi?id=76080

( 2014-03-13 07:59:32 +0200 )edit

According to Wikipedia Percent encoding a.k.a. URL encoding is supposed to be done for any character in a URI not defined by:

RFC 3986 section 2.2, Reserved Characters:

! * ' ( ) ; : @ & = + \$ , / ? # [ ]

RFC 3986 section 2.3, Unreserved Characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

a b c d e f g h i j k l m n o p q r s t u v w x y z

0 1 2 3 4 5 6 7 8 9 - _ . ~

... in other words, most non-ASCII characters should be percent encoded.

more

From Microsoft IE support page: "To represent a non-US-ASCII character, you must use that non-US-ASCII character directly in the encoding of the document in which you write the URI."

Check more: http://support.microsoft.com/kb/941052

So using UTF-8-encoded characters is definetly not a good solution while big portion of users still uses IE.

( 2014-03-12 15:23:52 +0200 )edit

Stats

Seen: 1,819 times

Last updated: Mar 12 '14