Preface: HTTP UTF8 backtrackLogo -Internet Security Systems

HTTP UTF8 backtrack

advICE :Intrusions : 2000639
Oh my gosh, I'm being HACKED!!!
How do I report the hacker to my ISP?
I'm seeing lots of attacks, is this normal?

An abnormal URL was seen that appears to be designed to bypass double checking within the web server. Specifically, the URL contains a pattern that is allows a remote intruder to access files or run programs on the web server machines.


In October of 2000, a serious flaw was discovered in Microsoft's IIS web-server software. This flaw would allow a remote intruder to execute any program on the web server, including programs that would allow the intruder to gain control over the web-server.

The flaw takes advantage of the fact that the underlying operating system accepts file names using the Unicode UTF8 character set, whereas the web-server does not. This allows a hostile filename to pass through the web-server's check.

The specific attack for exploiting this flaw uses a technique known as "backtracking", "directory climbing", or "directory traversal". This general problem has been found in numerous webservers (described here). In order to guard against intruders using this technique, a web server must "normalize" or "cononicalize" the URL. In other words, before accessing the requested files, the web sever must first interpret the path in order to pull out all the backtracking information.

The IIS flaw can be exploited by first encoding the characters in a special way as to defeat the normalization/cononicalization process. The technique is to encode the characters with Unicode.

About Unicode and UTF8

Traditional character sets support only the roughly 100 characters used in English. Additional character sets must be developed for other languages, each language having its own character set. In the past, there were a number of incompatible character sets, which meant that a single computer couldn't support both Chinese and Russian. The Unicode character set was developed to provide a universal coding of characters. Today, this character set includes simultaneous support for roughly 30,000 different characters, including such languages such as Chinese, Japanese, Korean, Russian, Arabic, and so forth.

A complication with Unicode is that it requires two-bytes (16-bits) for each character, whereas traditional computer software uses only a single byte (8-bits) for each character. As a result, a "multibyte" encoding for Unicode was developed, called "UTF-8" (Unicode TransForm using 8-bits). Instead of representing characters with a fixed number of bytes (one byte for English or two bytes for Unicode), a variable number of bytes were used. In this scheme, English text on average still required only a single byte per character. Western European languages (e.g. French) would occasionally require two bytes for some characters. Eastern European languages (e.g. Russian) would require two bytes for most characters, and Far Eastern languages (e.g. Chinese) would require three bytes for most characters. This scheme would allow traditional one-byte software to process even Chinese text.

About the attack

One of the problems with the UTF8 is that it can encode English text not only with the intended single character format, but also with two or three bytes. These are called "redundant" encodings. The problem with IIS is that it treats the redundant encodings as different characters, whereas the underlying operating system treats them as the same character.

In order to traverse directories, the intruder must specify directories with the name ".." (two dots). An example would be something like:

In order to defeat the cononicalization/backtracking, the intruder would need to UTF8 encode either the dots, or the slashes separating the dots from the other directory/file names. The UTF8 encodings for these characters are:
.2EC0 AEF0 80 AE
/2FC0 AFE0 80 AF
\5CC1 9CF0 81 9C

In order to evade cononicalization, the above example URL could be encoded as follows:


This alert triggers whenever a backtracking part of a URL (e.g. "../") has been seen where any of these characters have been encoded with a UTF8 multibyte encoding.


Attacks using this technique are quite common in the later part of the year 2000. Examine the URL indicated as part of this attack in order to determine if it looks like a hostile command.

Also, the event will often contain the return code from the web server. If the return code is something like "404", then the attack was ineffective. However, if the return code is something like "502", then there is a good chance the attack was successful.

 more information
MS Bulletin: MS00-078   Patch Available for "Web Server Folder Traversal" Vulnerability  
The official Unicode site.  
RFC2044   UTF-8, a transformation format of Unicode and ISO 10646
HTTP URL directory traversal/climbing  
A description of the underlying problem, as well as as listing of webservers that are vulnerable to it.  
q269862   Microsoft Patch Released for Canonicalization Error Issue
ISS Advisory 68   Serious flaw in Microsoft IIS UNICODE translation

 parametric information
URLThe URL stem of the request triggering this alert.

Version appeared: 2.1 

Privacy Policy |  Copyright Info