Spending a night reading the .ZIP File Format Specification

One night I was bored and I wanted something to read. Some people prefer novels, others prefer poems, I prefer an RFC or an specification. So I started to read the .ZIP File Format Specification .

The overall format of a .ZIP file can be found in paragraph 4.3.6 of the specification and is the following:

 4.3.6 Overall .ZIP file format:

      [local file header 1]
      [encryption header 1]
      [file data 1]
      [data descriptor 1]
      . 
      .
      .
      [local file header n]
      [encryption header n]
      [file data n]
      [data descriptor n]
      [archive decryption header] 
      [archive extra data record] 
      [central directory header 1]
      .
      .
      .
      [central directory header n]
      [zip64 end of central directory record]
      [zip64 end of central directory locator] 
      [end of central directory record]

Lets take a closer look at local file header and central directory structure

 4.3.7  Local file header:

      local file header signature     4 bytes  (0x04034b50)
      version needed to extract       2 bytes
      general purpose bit flag        2 bytes
      compression method              2 bytes
      last mod file time              2 bytes
      last mod file date              2 bytes
      crc-32                          4 bytes
      compressed size                 4 bytes
      uncompressed size               4 bytes
      file name length                2 bytes
      extra field length              2 bytes

      file name (variable size)
      extra field (variable size)
4.3.12 Central directory structure:

      [central directory header 1]
      .
      .
      .
      [central directory header n]
      [digital signature]

      File header:
          central file header signature   4 bytes (0x02014b50)
          version made by                 2 bytes
          version needed to extract       2 bytes
          general purpose bit flag        2 bytes
          compression method              2 bytes
          last mod file time              2 bytes
          last mod file date              2 bytes
          crc-32                          4 bytes
          compressed size                 4 bytes
          uncompressed size               4 bytes
          file name length                2 bytes
          extra field length              2 bytes
          file comment length             2 bytes
          disk number start               2 bytes
          internal file attributes        2 bytes
          external file attributes        4 bytes
          relative offset of local header 4 bytes

          file name (variable size)
          extra field (variable size)
          file comment (variable size)

Did you see that? The file name can be found in two places . In local file header and in central directory structure under the File header

I was trying to find which file name I MUST use, when I extract the .zip file, but I was not able to find anything. This makes me wonder, what will happen if a zip archive contains one file name in the Local file header and a different one in File header of the central directory structure?

In order to test that, I created a .ZIP file which uses the file name “test.txt” in the local file header and the file name “test.exe” in the File header. The file is the same. The file is an executable. The only difference is the file name .

part of gweeperx.zip

I have highlighted the hex values which indicate the start of the local file header and the one of the file header respectively. I have also highlighted the different file names.

I tested the above .ZIP file with different programs, libraries and sites.

It seems, that the (default/pre-installed) programs which are being used in order to unzip .ZIP files, in Linux and Windows, as well as the majority of the commercial and non commercial products, are mostly based on the file name which exists in the file header and not in the local file header .

However, there are exceptions in this rule. In fact there are many exceptions. Dropbox and JSZip (A library for creating, reading and editing .ZIP files with JavaScript, with more than 3.000.000 downloads weekly) are only two of them.

Dropbox

As it is presented, when we preview the .ZIP file in Dropbox, we can see that it contains a file with file name test.txt. However, when we download it we see another file name, the test.exe .

JSZip

The same with JSZip. As we can observe, when we read the contents of the zip archive, we find a text file. However, when we extract it, we have an executable. The unzip detects that there is a different file name in the file header and a different one in the local file header

Linux

I didn’t see that coming. If you drag and drop the file, you get a test.exe . However, if you choose to extract the contents you get a test.txt

This behavior can lead to major vulnerabilities, and we would be more than happy to hear your story or idea, if you come up with something.

One scenario would be the following:

Let’s assume that a zip archive contains an executable, the test.exe. The .ZIP uses the file name “test.txt” in the “Local file header” and “test.exe” in the “file header” .
A site owner does not allow .ZIP files containing “*.exe” files to be uploaded to her server, so her web application checks with JSZip that the .ZIP does not contain files with “.exe” extension.
The JSZip, will report that the .ZIP contains the text file “test.txt” and thus the web application will allow the user to upload it.
However, if the web application does not extract .ZIP file with the JSZip, but instead of that, the site owner sends by e-mail these .ZIP files, or she does the extraction with the default windows unzip or with winrar or with the linux unzip, the “test.exe” is extracted.

What could possibly go wrong?

Resources

GitHub

Visit our GitHub at https://github.com/RedyOpsResearchLabs/

RedyOps team

RedyOps team, uses the 0-day exploits produced by Research Labs, before vendor releases any patch. They use it in special engagements and only for specific customers.

You can find RedyOps team at https://redyops.com/

Angel

Discovered 0-days which affect marine sector, are being contacted with the Angel Team. ANGEL has been designed and developed to meet the unique and diverse requirements of the merchant marine sector. It secures the vessel’s business, IoT and crew networks by providing oversight, security threat alerting and control of the vessel’s entire network.

You can find Angel team at https://angelcyber.gr/

Illicium

Our 0-days cannot win Illicium. Today’s information technology landscape is threatened by modern adversary security attacks, including 0-day exploits, polymorphic malwares, APTs and targeted attacks. These threats cannot be identified and mitigated using classic detection and prevention technologies; they can mimic valid user activity, do not have a signature, and do not occur in patterns. In response to attackers’ evolution, defenders now have a new kind of weapon in their arsenal: Deception.

You can find Illicium team at https://deceivewithillicium.com/

Neutrify

Discovered 0-days are being contacted to the Neutrify team, in order to develop related detection rules. Neutrify is Neurosoft’s 24×7 Security Operations Center, completely dedicated to threats monitoring and attacks detection. Beyond just monitoring, Neutrify offers additional capabilities including advanced forensic analysis and malware reverse engineering to analyze incidents.

You can find Neutrify team at https://neurosoft.gr/contact/