Ask Your Question
0

Can URL encoding be disabled? [closed]

asked 2014-03-11 15:58:04 +0200

Tomas Tunkl gravatar image

updated 2014-03-12 12:37:49 +0200

oweng gravatar image

Hi! Is there a way to disable URL encoding for non-ASCII (UTF-8) characters? Once saving the HTML file the links turn into URL encoding format and IE is unable to read them. Thnx for help!

Example:

Prior editing and saving in LibreOffice:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  <html>
  <head>
  <meta http-equiv="content-type" content="text/html; charset=windows-1250">
  </head>
  <body lang="cs-CZ" text="#000000" dir="ltr" style="background: transparent">
  <p><a href="Íii.odt" target="_blank">Íii</a></p>
  <p>   
  </p>
  </body>
  </html>

After editing and saving in LibreOffice:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
 <meta http-equiv="content-type" content="text/html; charset=windows-1250">
<title></title>
<meta name="created" content="0;0">
<meta name="changed" content="20140311;164828687000000">
<style type="text/css">
<!--
    p { color: #000000 }
-->
</style>
</head>
<body lang="cs-CZ" text="#000000" dir="ltr" style="background: transparent">
<p><a href="%C3%ADii.odt" target="_blank">Íii</a></p>
<p><br>
</p>
</body>
</html>
edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-02-20 07:30:30.921183

2 Answers

Sort by » oldest newest most voted
1

answered 2014-03-12 13:24:24 +0200

erAck gravatar image

The underlying problem seems to be charset=windows-1250, IE seems to stumble over the URL being encoded in UTF-8 while the meta tag says it would be windows-1250. If you could make that use charset=utf-8 instead things might work.

edit flag offensive delete link more

Comments

Well the charset change doesn't help. It is because the IE doesn't support utf-8 format of url at all. Check here http://support.microsoft.com/kb/941052

Tomas Tunkl gravatar imageTomas Tunkl ( 2014-03-12 15:13:24 +0200 )edit

The original example is using Windows 1250 encoding (not UTF-8) i.e., Í is encoded as 013a (hex) / 229 (decimal), whereas in UTF-8 this would be U+00cd. LO is converting the Windows 1250 character to percent encoding, as per the W3C (RFC standard) recomendation. @erAck is correct in that with charset=utf-8 I can save the original HTML under v4.1.4.2 and obtain <A HREF="Íii.odt" TARGET="_blank">Íii</A> which is valid UTF-8 encoding. If your host operating system is set to use Windows 1250 encoding you will have a larger issue.

oweng gravatar imageoweng ( 2014-03-13 00:27:48 +0200 )edit

Well strange. I of course changed the charset and tried but it is not working. I filled a bug report so we will see. https://bugs.freedesktop.org/show_bug.cgi?id=76080

Tomas Tunkl gravatar imageTomas Tunkl ( 2014-03-13 07:59:32 +0200 )edit
0

answered 2014-03-12 12:50:12 +0200

oweng gravatar image

According to Wikipedia Percent encoding a.k.a. URL encoding is supposed to be done for any character in a URI not defined by:

RFC 3986 section 2.2, Reserved Characters:

! * ' ( ) ; : @ & = + $ , / ? # [ ]

RFC 3986 section 2.3, Unreserved Characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

a b c d e f g h i j k l m n o p q r s t u v w x y z

0 1 2 3 4 5 6 7 8 9 - _ . ~

... in other words, most non-ASCII characters should be percent encoded.

edit flag offensive delete link more

Comments

From Microsoft IE support page: "To represent a non-US-ASCII character, you must use that non-US-ASCII character directly in the encoding of the document in which you write the URI."

Check more: http://support.microsoft.com/kb/941052

So using UTF-8-encoded characters is definetly not a good solution while big portion of users still uses IE.

Tomas Tunkl gravatar imageTomas Tunkl ( 2014-03-12 15:23:52 +0200 )edit

Question Tools

1 follower

Stats

Asked: 2014-03-11 15:58:04 +0200

Seen: 1,819 times

Last updated: Mar 12 '14