UIUCTF 2020 - Zip Heck (350pts)

July 2020

Author: Ptomerty

Challenge description:

I zipped up the flag a few times for extra security.

https://files.uiuc.tf/flag.zip

The intended solution runs in under 10 minutes on a typical computer.

Author: kuilin

File mirror: flag.zip

Gathering Info

In this challenge, we’re provided a ZIP file and told to find the flag. First, let’s confirm that we’re working with a legitimate ZIP file and that no shenanigans are present:

$ file flag.zip
flag.zip: Zip archive data, at least v2.0 to extract
$ binwalk flag.zip
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Zip archive data, at least v2.0 to extract, compressed size: 61368967, uncompressed size: 61350811, name: flag.zip

Okay, seems normal enough. Let’s try unzipping it to see what horrors lie within:

$ unzip flag.zip
Archive:  flag.zip
made for uiuctf by kuilin
replace flag.zip? [y]es, [n]o, [A]ll, [N]one, [r]ename: y
  inflating: flag.zip

Like a matryoshka doll, it seems that unzipping our original flag.zip results in another, smaller flag.zip. Let’s do a quick sanity check and compare the sizes of our files:

$ ls -l flag.zip orig.zip
-rwxrwxrwx 1 john john 61350811 Dec 31  1979 flag.zip
-rwxrwxrwx 1 john john 61369113 Jul 18 13:54 orig.zip

Attempt 1: DigitalOcean Credits Go Brrrrr

At first glance, this challenge looked ez; write a bash loop that continuously extracted our ZIP until we’re left with the center of our ZIP onion, which was hopefully our flag. I wrote a quick script and stuck it in a screen session, then headed off to sleep:

$ while true; do
7z e flag.zip -aou > /dev/null # extract to flag_1.zip, silence output
cp flag_1.zip flag.zip # if we stop script, we can recover
rm flag_1.zip          # either flag.zip or flag_1.zip
done

…only to wake up super disappointed. A full night’s worth of unzipping had only reduced our flag.zip from 59,931 KB to 45,047 KB, and the latest iterations were only shaving off ~1 KB from the ZIP every few seconds.

Clearly, this solution was unsustainable. Let’s roll up our sleeves and get down and dirty with the ZIP file format.

Extracting ZIP’s Secrets

In our modern “TBs aren’t enough storage” world, an archive file is usually considered synonymous with a compressed file, with RAR, 7z, and ZIP files usually automatically compressing their contents using various algorithms. However, before compression methods were widespread, archive files existed primarily to group several files into one unique file, making file transfer easier. ¹

Let’s see what a ZIP file actually looks like by skimming the Wikipedia page. We note that ZIP supports a lot of compression methods, the most common of which is DEFLATE, but also supports a STORE (no compression) method. With that in mind, let’s see what our flag file has to offer us, using a utility called zipdetails:

$ zipdetails -v flag.zip

0000000 0000004 50 4B 03 04 LOCAL HEADER #1       04034B50
0000004 0000001 14          Extract Zip Spec      14 '2.0'
0000005 0000001 00          Extract OS            00 'MS-DOS'
0000006 0000002 00 00       General Purpose Flag  0000
                            [Bits 1-2]            0 'Normal Compression'
0000008 0000002 08 00       Compression Method    0008 'Deflated'
000000A 0000004 00 00 00 00 Last Mod Time         00000000 'Fri Nov 30 00:00:00 1979'
000000E 0000004 55 49 55 43 CRC                   43554955
0000012 0000004 87 6A A8 03 Compressed Length     03A86A87
0000016 0000004 9B 23 A8 03 Uncompressed Length   03A8239B
000001A 0000002 08 00       Filename Length       0008
000001C 0000002 00 00       Extra Length          0000
000001E 0000008 66 6C 61 67 Filename              'flag.zip'
                2E 7A 69 70
0000026 3A86A87 ...         PAYLOAD

3A86AAD 0000004 50 4B 01 02 CENTRAL HEADER #1     02014B50
3A86AB1 0000001 14          Created Zip Spec      14 '2.0'
3A86AB2 0000001 00          Created OS            00 'MS-DOS'
3A86AB3 0000001 14          Extract Zip Spec      14 '2.0'
3A86AB4 0000001 00          Extract OS            00 'MS-DOS'
3A86AB5 0000002 00 00       General Purpose Flag  0000
                            [Bits 1-2]            0 'Normal Compression'
3A86AB7 0000002 08 00       Compression Method    0008 'Deflated'
3A86AB9 0000004 00 00 00 00 Last Mod Time         00000000 'Fri Nov 30 00:00:00 1979'
3A86ABD 0000004 55 49 55 43 CRC                   43554955
3A86AC1 0000004 87 6A A8 03 Compressed Length     03A86A87
3A86AC5 0000004 9B 23 A8 03 Uncompressed Length   03A8239B
3A86AC9 0000002 08 00       Filename Length       0008
3A86ACB 0000002 00 00       Extra Length          0000
3A86ACD 0000002 00 00       Comment Length        0000
3A86ACF 0000002 00 00       Disk Start            0000
3A86AD1 0000002 00 00       Int File Attributes   0000
                            [Bit 0]               0 'Binary Data'
3A86AD3 0000004 00 00 00 00 Ext File Attributes   00000000
3A86AD7 0000004 00 00 00 00 Local Header Offset   00000000
3A86ADB 0000008 66 6C 61 67 Filename              'flag.zip'
                2E 7A 69 70

3A86AE3 0000004 50 4B 05 06 END CENTRAL HEADER    06054B50
3A86AE7 0000002 00 00       Number of this disk   0000
3A86AE9 0000002 00 00       Central Dir Disk no   0000
3A86AEB 0000002 01 00       Entries in this disk  0001
3A86AED 0000002 01 00       Total Entries         0001
3A86AEF 0000004 36 00 00 00 Size of Central Dir   00000036
3A86AF3 0000004 AD 6A A8 03 Offset to Central Dir 03A86AAD
3A86AF7 0000002 20 00       Comment Length        0020
3A86AF9 0000020 6D 61 64 65 Comment               'made for uiuctf by kuilin       '
                20 66 6F 72
                20 75 69 75
                63 74 66 20
                62 79 20 6B
                75 69 6C 69
                6E 00 00 00
                00 00 00 00
Done

Whew! That’s a lot of data, but we got a lot of information about our ZIP. First, we see that the header at the beginning of the file specifies the compression of the ZIP’s internal payload. We can also see that our payload starts at 0x26 (38) bytes in, and that our footer takes up 108 bytes at the end (count ’em!). Using Python syntax, this means that any valid ZIP file’s payload should be located at file[38:-108]. Perhaps we can extract the payload and do something with it.

Our flag.zip appears to be using “normal” compression (DEFLATE), which means that we won’t have too much luck stripping out the ZIP payload, as we don’t have an easy method of decompressing DEFLATE’d files. Hey, remember that unzipping program we left running overnight, though? Let’s see what that file’s compression looks like:

$ zipdetails -v save.zip

0000000 0000004 50 4B 03 04 LOCAL HEADER #1       04034B50
0000004 0000001 14          Extract Zip Spec      14 '2.0'
0000005 0000001 00          Extract OS            00 'MS-DOS'
0000006 0000002 00 00       General Purpose Flag  0000
0000008 0000002 00 00       Compression Method    0000 'Stored'
<output truncated>

Now we’re talking! After unzipping for a certain amount of time, it looks like our ZIP files are just getting STORE’d recursively, with no additional compression applied. Theoretically, since we know the ZIP payload location, we can theoretically snip off many ZIP “layers” at once by modifying the file, instead of waiting for unzip to extract one layer at a time.

Let’s inspect the raw hex of the file in xxd to check our hypothesis: (Note: use xxd flag.zip | less to avoid overloading your terminal with 50 MB of garbage)

00000000: 504b 0304 1400 0000 0000 0000 0000 5549  PK............UI
00000010: 5543 bb98 6402 bb98 6402 0800 0000 666c  UC..d...d.....fl
00000020: 6167 2e7a 6970 504b 0304 1400 0000 0000  ag.zipPK........
00000030: 0000 0000 5549 5543 2998 6402 2998 6402  ....UIUC).d.).d.
00000040: 0800 0000 666c 6167 2e7a 6970 504b 0304  ....flag.zipPK..
00000050: 1400 0000 0000 0000 0000 5549 5543 9797  ..........UIUC..
00000060: 6402 9797 6402 0800 0000 666c 6167 2e7a  d...d.....flag.z
00000070: 6970 504b 0304 1400 0000 0000 0000 0000  ipPK............
00000080: 5549 5543 0597 6402 0597 6402 0800 0000  UIUC..d...d.....
00000090: 666c 6167 2e7a 6970 504b 0304 1400 0000  flag.zipPK......
000000a0: 0000 0000 0000 5549 5543 7396 6402 7396  ......UIUCs.d.s.
000000b0: 6402 0800 0000 666c 6167 2e7a 6970 504b  d.....flag.zipPK
<output truncated>

We know from our Wikipedia browsing that ZIP files begin with the magic number PK\x03\x04, so we’re probably seeing a lot of ZIP files, nested within each other. This fits with our earlier theory that a STORE ZIP holds its payload directly in the file. Theoretically, that means that if we cut the file at the final PK\x03\x04 header and its corresponding footer, we can directly extract the payload from inside all these layers without unzipping it!

Attempt 2: Manual Labor Always Works (Usually)

Let’s test our theory out by manually cutting out a payload. In save.zip, we can scroll down to find our final PK\x03\x04 header at location 0x827a. Since we know the size of our header, we know that we have 0x827a / 38 or 879 nested ZIP headers and footers. Thus, we need to trim starting from byte 33402 (879 * 38), as well as cut off the final 94932 (879 * 108) bytes. Since save.zip has a total size of 40147277 bytes, I ran the following dd command:

dd if=save.zip of=out.zip iflag=skip_bytes,count_bytes,fullblock \
    bs=4096 skip=33402 count=40018943
    # Note that 40147277 - 879 * (38 + 108) = 40018943

Let’s inspect our out.zip to make sure our theory works:

$ zipdetails -v out.zip

0000000 0000004 50 4B 03 04 LOCAL HEADER #1       04034B50
0000004 0000001 14          Extract Zip Spec      14 '2.0'
0000005 0000001 00          Extract OS            00 'MS-DOS'
0000006 0000002 00 00       General Purpose Flag  0000
                            [Bits 1-2]            0 'Normal Compression'
0000008 0000002 08 00       Compression Method    0008 'Deflated'

Looks like we’re not done extracting yet. We’ve successfully cut away 879 ZIP “wrappers”, but our final payload seems like an actual compressed file, which we can’t work our dd magic on. To make matters worse, our file size has barely changed, only decreasing by 128334 bytes to 40018943. Since we’re working by hand, solving this challenge would result in us likely zipped up ourselves, but in a straitjacket.

Let’s see what happens if we extract out.zip and inspect it with xxd: (don’t forget less!)

00000000: 504b 0304 1400 0000 0000 0000 0000 5549  PK............UI
00000010: 5543 0132 6402 0132 6402 0800 0000 666c  UC.2d..2d.....fl
00000020: 6167 2e7a 6970 504b 0304 1400 0000 0000  ag.zipPK........
00000030: 0000 0000 5549 5543 6f31 6402 6f31 6402  ....UIUCo1d.o1d.
00000040: 0800 0000 666c 6167 2e7a 6970 504b 0304  ....flag.zipPK..
00000050: 1400 0000 0000 0000 0000 5549 5543 dd30  ..........UIUC.0

We lucked out! If we’ve got more STORE ZIPs inside our compressed ZIP, we can cut those away and only decompress when we encounter a DEFLATE ZIP.

Attempt 3: Scripting Stonks 📈📈

We’ve established that our flag.zip files seem to be a recursive structure, usually made up of a few hundred STORE layers that contain a compressed DEFLATE ZIP. With this knowledge, let’s establish a game plan for our desired behavior:

Given flag.zip, look for multiple PK\x03\x04 headers.
a. If we find multiple headers, our file has STORE layers. Cut the extra headers and footers off of the file to remove them.
b. If we don’t find many headers (1 or 2 are fine), our file has been compressed with DEFLATE. Extract the contents, which should yield more STORE layers.

With this plan, I wrote a Python script to automate the extraction process.

Solution

This Python3 program opens an input file flag.zip then executes the plan we described above. First, we use regex on the bytes of the file to find all matches of PK\x03\x04\x14\x00, which should match our ZIP headers. ²

We take the number of matches we have and calculate where the final STORE header and footer is located, then slice it out of our bytearray. (Don’t forget to subtract by 1, as Python is zero-indexed!) We should be left with a DEFLATE ZIP file in memory, which we can convert to a ZipFile object and decompress. Rinse and repeat as needed.

Every 250 iterations, we also save our current ZIP file to our filesystem, ensuring that if our process breaks we haven’t lost all of the work we’ve already put in. If we encounter a file that can no longer be unzipped, this should be our flag!

My final output:

$ time python3 zipheck.py
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
Saving intermediary output...
b'uiuctf{tortoises_all_the_way_down}'

real    2m30.507s
user    2m13.005s
sys     0m16.040s

The final script!

#!/usr/bin/python3
# Author: Ptomerty

import re
import zipfile
import io

with open("flag.zip", "rb") as f:
	file_bytes = f.read()

header_len = 38
footer_len = 108

while True:
	try:
		for i in range(250):
			# Look for multiple PK headers
			last_header = len(re.findall(rb"PK\x03\x04\x14\x00",
				file_bytes)) - 1
			if last_header > 2:
				# snip STORE zips
				file_bytes = file_bytes[last_header * header_len:
					-1 * last_header * footer_len]

			# read ZIP file from memory
			zfile = zipfile.ZipFile(io.BytesIO(file_bytes), "r")
			# Decompress
			file_bytes = zfile.read(zfile.infolist()[0])

		# save intermediary ZIPs every so often
		with open("intermediary.zip", "wb") as f:
			print("Saving intermediary output...")
			f.write(file_bytes)
	except zipfile.BadZipFile:
		# if it's not a zip, it's our flag
		print(file_bytes)
		exit()
	except:
		raise

Flag: `uiuctf{tortoises_all_the_way_down}`

Notes

One example of archive files that we still see today is tar, which takes in multiple files as an input and outputs an single, uncompressed archive file. Nowadays, the -z option automagically compresses your tar archive using gzip., which is why you almost never see raw .tar files in practice, but rather compressed .tar.gz or .tar.bz2 files. ↩︎
Yes, the ZIP header is techically only 4 bytes long. However, it’s possible that a compressed payload could randomly have bytes PK\x03\x04 that match a valid ZIP header. I noted that the 2 bytes after the header were consistently equal to \x14\x00 and extended my pattern to decrease the likelihood of having to manually fix invalid matches. ↩︎

#ctf