Learning Python for Forensics

上QQ阅读APP看书，第一时间看更新

Our first iteration – setupapi_parser.v1.py

The goal of our first iteration is to develop a functional prototype that we will improve upon in later iterations. We will continue to see the following code block in all our scripts, which provides basic documentation about the script:

001 __author__ = 'Preston Miller & Chapin Bryce'
002 __date__ = '20160401'
003 __version__ = 0.01
004 __description__ = 'This scripts reads a Windows 7 Setup API log and prints USB Devices to the user'

Our script involves three functions which are outlined below. The main() function kicks off the script by calling the parse_setupapi() function. This function reads the setupapi.dev.log file and extracts the USB device and first installation date information. After processing, the print_output() function is called with the extracted information. The print_output() function takes the extracted information and prints it to the user in the console. These three functions work together to allow us to segment our code based on operation:

007 def main():
...
023 def parseSetupapi()
...
041 def printOutput()

To run this script, we need to provide code that calls the main() function. The code block later shows a Python feature that we will use in almost every one of our scripts throughout this book. This section of code will become more complex throughout this chapter, adding the ability to allow users to control input, output, and provide optional arguments.

Line 53 is simply an if statement that checks to see if this script is called from the command line. In more detail, the __name__ attribute allows Python to know what function called the code. When __name__ is equivalent to the string '__main__', it indicates that it is the top-level script and therefore likely executed at the command line. This feature is especially important when designing code that may be called by another script. Someone else may import your functions into their code which, without this conditional, might result in our script immediately running when imported. We have the following code:

053 if __name__ == '__main__':
054     # Run the program
055     main()

As seen in the flow chart below, the trunk function (our script as a whole) calls the main() function, which in turn calls parseSetupapi(), which finally calls the printOutput() function.

Our first iteration – setupapi_parser.v1.py

Designing the main() function

The main() function, defined on line 7, is fairly straightforward in this scenario. The function handles initial variable assignments and setup before calling parseSetup(). In the code block below, we create a docstring, surrounded with triple double quotes where we define the purpose of the function along with the data returned by it as seen on lines 8 through 11. Pretty sparse, right? We'll enhance our documentation as we proceed, as things might change drastically this early in development.

007 def main():
008     """
009     Run the program
010     :return: None
011     """

After the docstring, we hardcode the path to the setupapi.dev.log file on line 13. This means that our script can only function correctly if the file with this name is located in the same directory as our script.

013     file_path = 'setupapi.dev.log'

On lines 16 through 18, we print setup information, including script name and version, to the console, which notifies the user that the script is running. In addition, we print out 22 equal signs to provide a visual distinction between the setup information and any other output from the script:

015     # Print version information when the script is run
016     print '='*22
017     print 'SetupAPI Parser, ', __version__
018     print '='*22

Finally, on line 20, we call our next function to parse the input file. This function expects a str object representing the path to the setupapi.dev.log. Though it may seem to defeat the purpose of a main function, we will place the majority of the functionality in a separate function. This allows us to reuse code dedicated to the primary functionality in other scripts. An example of this will be shown in the final iteration of this Setup API code. See line 20:

020     parseSetupapi(file_path)

Crafting the parseSetupapi() function

The parseSetupapi() function, defined on line 23, takes a string input that represents the full path to the Windows 7 or higher setupapi.dev.log file, as detailed by the docstring on lines 24 through 28. On line 29, we open the file path provided by the main() function and read the data into a variable named in_file. This open statement didn't specify any parameters, so it uses default settings that open the file in read-only mode. This prevents us from accidentally writing to the file. In fact, trying to write() to a file opened in read-only mode results in the following error and message:

IOError: File not open for reading

Although it does not allow writing to the file, the use of write-blocking technology should always be used when handling digital evidence. If there is any confusion regarding files and their modes, refer to Chapter 1, Now For Something Completely Different, for additional detail. See the following code:

023 def parseSetupapi(setup_file):
024     """
025     Interpret the file
026     :param setup_file: path to the setupapi.dev.log
027     :return: None
028     """
029     in_file = open(setup_file)

On line 30, we read each line from the in_file variable into a variable named data using the file object's readlines() method, which creates a list. Each element in the list represents a single line in the file. In more detail, each element in the list is the string of text from the file delimited by the newline, \n, character. At this newline character, the data is broken into a new element and fed as a new entry into the data list.

030     data = in_file.readlines()

With the content of the file stored in the variable data, we begin a for loop to walk through each individual line. This loop uses the enumerate() function, which wraps our iterator with a counter that keeps track of the number of iterations. This is desirable because we want to check for the pattern that identifies a USB device entry, then read the following line to get our date value.

By keeping track of what element we are currently processing, we can easily pull out the next line we need to process with data[n + 1], where n is the enumerated count of the current line being processed.

032     for i,line in enumerate(data):

Once inside the loop, on line 33, we evaluate if the current line contains the string 'device install (hardware initiated)'. To ensure that we don't miss valuable data, we will make the current line case insensitive using the lower() method to convert all characters in the string to lowercase. If responsive, we execute lines 34 through 36. On line 34, we use the current iteration count variable, i, to access the responsive line within the data object.

033         if 'device install (hardware initiated)' in line.lower():
034             device_name = data[i].split('-')[1].strip()

After accessing the value, we call the split() function on the string to split the values on the dash (-) character. After splitting, we access the second value in the split list, and feed that string into the strip() function. The strip() function, without any provided values, will strip whitespace characters on the left and right end of the string. We process the responsive line into one containing just USB identifying information:

Prior to processing:

'>>> [Device Install (Hardware initiated) - pci\ven_8086&dev_100f&subsys_075015ad&rev_01\4&b70f118&0&0888]'

Post processing:

'pci\ven_8086&dev_100f&subsys_075015ad&rev_01\4&b70f118&0&0888]'

After converting the first line from the setupapi.dev.log USB entry, we then access the data variable on line 35 to obtain the date information from the following line. We know the date value sits on the line after the device information. We can use the iteration count variable, i, and add one to access that next line. Similarly to the device line parsing, we call the split() function on the string "start" and extract the second element from the split that represents the date. Before saving the value, we need to call strip() to remove whitespaces on both ends of the string.

035             date = data[i+1].split('start')[1].strip()

This process removes any other characters besides the date:

Prior to processing:

'>>> Section start 2010/11/10 10:21:14.656'

Post processing:

'2010/11/10 10:21:14.656'

On line 36, we pass our extracted device_name and date values to the print_output() function. This function is called repeatedly for any responsive lines found in the loop. After the loop completes, the code on line 38 executes, which closes the setupapi.dev.log file that we initially opened, releasing the file from Python's use.

036             printOutput(device_name, date)
037 
038     in_file.close()

Developing the printOutput() function

The printOutput() function defined on line 41 allows us to control how the data is displayed to the user. The function requires two strings as input that represent the USB name and date as defined by the docstring. On line 49 and 50, we print the USB data using the format() function. As discussed in Chapter 1, Now For Something Completely Different, this function replaces the curly brackets ({}) with the data provided in the function call. A simple example like this doesn't show off the full power of the .format() method. However, this function can allow us to perform complex string formatting with ease. After printing the input, execution returns to the called function where the script continues the next iteration of the loop, as follows:

041 def printOutput(usb_name, usb_date):
042     """
043     Print the information discovered
044     :param usb_name: String USB Name to print
045     :param usb_date: String USB Date to print
046     :return: None
047     """
048 
049     print 'Device: {}'.format(usb_name)
050     print 'First Install: {}'.format(usb_date)

Running the script

We now have a script that takes a setupapi.dev.log file, found on Windows 7 or higher, and outputs USB entries with their associated timestamps. The following screenshot shows how to execute the script with a sample setupapi.dev.log file that has been provided in the code bundle. Your output may vary depending on the setupapi.dev.log file you use the script on.

We modified the supplied setupapi.dev.log so USB entries appeared at the top of the output for the screenshot. Our current iteration seems to generate some false positives by extracting "responsive" lines that do not pertain to just USB devices.