#2 It’s over Quintillion! - PDF cracking, how hard can it be?!
We all know passwords have been used to secure our accounts, documents, and other things for years and some of us have fallen victim to even forgetting our passwords at times.
Thinking back to my time at Bournemouth University, we had one assessment that was a fun but also a bit of a brain teaser and this weekend I’ve recreated this fun challenge.
The aim: break into a password protected PDF, without being given the password by the user, using the power of a 2GB, raspberry Pi & a password protected PDF, also a laptop with an internet connection (since at the time I was waiting for a micro HDMI cable to access the GUI of my Pi).
Since this is me, I met some interesting challenges along the way, including freezing my Raspberry Pi 4b to the point of force restarting it, maxing out 16GB RAM of my gaming PC and trying to count over a billion, which I ended up going over a quintillion, past 66 quintillion for that matter!
The code:
The original idea was for my housemate to password protect a PDF up to 10 (ascii) characters long and all characters (upper- and lower-case letters, symbols and numbers) could be present to make the password. As a result, I needed to create a Python script to serve 2 functions:
- Create a password list (also known as a password dictionary) with every password combination it could be:
a. This was done by creating a for loop to convert the numbers 32-127 into ascii characters (the characters typically found on most (UK) keyboards) and add them to a list. Additionally, I added the GBP sign that was missing from this list.
b. The challenging bit was to find a way to go through every combination of the characters I had in my list, up to a password length of 10. Searching online, I found there was a python module called itertools, that had an iterator named “product()”, that acted as a nested for loop. This would loop through my ascii list to create combination of the characters in that list and repeat this to a specified amount (in my case I wanted it to loop 10 times, to create a list of every character combination up to a length of 10). This would output a Tuple, an unchangeable list of items.
c. Created a for loop to copy the list of characters (that was a tuple) into a new list, so I could make it useable for section 2 of my code.
d. When it came to testing section 1. of my code, I was greet with an error I haven’t seen before, “Killed”. It turned out that I momentarily killed my Raspberry Pi, by running my code. I lowered the product() repeat value to 6 and ran the code again. It again gave me the error code of “Killed”. Upon further testing of running my code, this time on my gaming PC, it had turned out that I maxed my feeble 1.6GB of available RAM on my Pi and even maxed out my 16GB of RAM on my gaming PC. A quick bit of math revelled why, if I was to attempt to find every possible combination of a 10 character long password, I was trying to generate 66,483,263,599,150,104,576 values, if we roughly think a character is a byte of data that’s over 66 Exabytes (66 billion GB) of data loading to RAM, since Python is coded to load code in memory to execute!!
e. Due to this “issue”, I had to “move the goalposts of my own game”. I instead my housemate picked a password from a list of top 10million passwords found on the internet. GitHub link here for those curious of the password dictionary.
f. As a result, lines 3-20 of my code were no longer needed, so I commented the majority of the code. Upon reflection I should had swapped lines 1 & 2 the other way around and commented out lines 2 to 20 (since I no longer need to import itertools and the blank variable “asci_list”).
2. Open the pdf and test each possible password until the correct password is found then display the successful password:
a. The 2nd section of the code was relatively simple to create, since pikepdf contained a helpful guide. My code opened the pw list text file and stored the contents of the pw dictionary as a list in a variable called “pw_list”.
b. Next a for loop was created to open the password protected pdf, try each password in the password list and if it eventually found the correct password, print the value/password on the console.
This exercise provided a useful lesson that it may be relatively easy to create a script to attempt to break into a password protected (PDF) document, however creating password dictionaries can often be time consuming and can require a large amount of computing resources.
Sites like https://www.passwordmonster.com/can show you how secure your password may (or may not) be.
Potential security features to reduce the likelihood of a adversary (hacker) brute forcing their way to your document or account include, 2FA (multi-factor authentication), setting limited number of login attempts, using CAPTCHAs, etc.