Our first iteration – setupapi_parser.v1.py
The goal of our first iteration is to develop a functional prototype that we will improve upon in later iterations. We will continue to see the following code block in all our scripts, which provides basic documentation about the script:
001 __author__ = 'Preston Miller & Chapin Bryce' 002 __date__ = '20160401' 003 __version__ = 0.01 004 __description__ = 'This scripts reads a Windows 7 Setup API log and prints USB Devices to the user'
Our script involves three functions which are outlined below. The main()
function kicks off the script by calling the parse_setupapi()
function. This function reads the setupapi.dev.log
file and extracts the USB device and first installation date information. After processing, the print_output()
function is called with the extracted information. The print_output()
function takes the extracted information and prints it to the user in the console. These three functions work together to allow us to segment our code based on operation:
007 def main(): ... 023 def parseSetupapi() ... 041 def printOutput()
To run this script, we need to provide code that calls the main()
function. The code block later shows a Python feature that we will use in almost every one of our scripts throughout this book. This section of code will become more complex throughout this chapter, adding the ability to allow users to control input, output, and provide optional arguments.
Line 53 is simply an if
statement that checks to see if this script is called from the command line. In more detail, the __name__
attribute allows Python to know what function called the code. When __name__
is equivalent to the string '__main__'
, it indicates that it is the top-level script and therefore likely executed at the command line. This feature is especially important when designing code that may be called by another script. Someone else may import your functions into their code which, without this conditional, might result in our script immediately running when imported. We have the following code:
053 if __name__ == '__main__': 054 # Run the program 055 main()
As seen in the flow chart below, the trunk function (our script as a whole) calls the main()
function, which in turn calls parseSetupapi()
, which finally calls the printOutput()
function.
Designing the main() function
The main()
function, defined on line 7, is fairly straightforward in this scenario. The function handles initial variable assignments and setup before calling parseSetup()
. In the code block below, we create a docstring, surrounded with triple double quotes where we define the purpose of the function along with the data returned by it as seen on lines 8 through 11. Pretty sparse, right? We'll enhance our documentation as we proceed, as things might change drastically this early in development.
007 def main(): 008 """ 009 Run the program 010 :return: None 011 """
After the docstring, we hardcode the path to the setupapi.dev.log
file on line 13. This means that our script can only function correctly if the file with this name is located in the same directory as our script.
013 file_path = 'setupapi.dev.log'
On lines 16 through 18, we print setup information, including script name and version, to the console, which notifies the user that the script is running. In addition, we print out 22 equal signs to provide a visual distinction between the setup information and any other output from the script:
015 # Print version information when the script is run 016 print '='*22 017 print 'SetupAPI Parser, ', __version__ 018 print '='*22
Finally, on line 20, we call our next function to parse the input file. This function expects a str
object representing the path to the setupapi.dev.log
. Though it may seem to defeat the purpose of a main function, we will place the majority of the functionality in a separate function. This allows us to reuse code dedicated to the primary functionality in other scripts. An example of this will be shown in the final iteration of this Setup API code. See line 20:
020 parseSetupapi(file_path)
Crafting the parseSetupapi() function
The parseSetupapi()
function, defined on line 23, takes a string input that represents the full path to the Windows 7 or higher setupapi.dev.log
file, as detailed by the docstring on lines 24 through 28. On line 29, we open the file path provided by the main()
function and read the data into a variable named in_file
. This open statement didn't specify any parameters, so it uses default settings that open the file in read-only mode. This prevents us from accidentally writing to the file. In fact, trying to write()
to a file opened in read-only mode results in the following error and message:
IOError: File not open for reading
Although it does not allow writing to the file, the use of write-blocking technology should always be used when handling digital evidence. If there is any confusion regarding files and their modes, refer to Chapter 1, Now For Something Completely Different, for additional detail. See the following code:
023 def parseSetupapi(setup_file): 024 """ 025 Interpret the file 026 :param setup_file: path to the setupapi.dev.log 027 :return: None 028 """ 029 in_file = open(setup_file)
On line 30, we read each line from the in_file
variable into a variable named data
using the file object's readlines()
method, which creates a list. Each element in the list represents a single line in the file. In more detail, each element in the list is the string of text from the file delimited by the newline, \n
, character. At this newline character, the data is broken into a new element and fed as a new entry into the data list.
030 data = in_file.readlines()
With the content of the file stored in the variable data
, we begin a for
loop to walk through each individual line. This loop uses the enumerate()
function, which wraps our iterator with a counter that keeps track of the number of iterations. This is desirable because we want to check for the pattern that identifies a USB device entry, then read the following line to get our date value.
By keeping track of what element we are currently processing, we can easily pull out the next line we need to process with data[n + 1]
, where n
is the enumerated count of the current line being processed.
032 for i,line in enumerate(data):
Once inside the loop, on line 33, we evaluate if the current line contains the string 'device install (hardware initiated)'
. To ensure that we don't miss valuable data, we will make the current line case insensitive using the lower()
method to convert all characters in the string to lowercase. If responsive, we execute lines 34 through 36. On line 34, we use the current iteration count variable, i
, to access the responsive line within the data
object.
033 if 'device install (hardware initiated)' in line.lower(): 034 device_name = data[i].split('-')[1].strip()
After accessing the value, we call the split()
function on the string to split the values on the dash (-
) character. After splitting, we access the second value in the split list, and feed that string into the strip()
function. The strip()
function, without any provided values, will strip whitespace characters on the left and right end of the string. We process the responsive line into one containing just USB identifying information:
Prior to processing:
'>>> [Device Install (Hardware initiated) - pci\ven_8086&dev_100f&subsys_075015ad&rev_01\4&b70f118&0&0888]'
Post processing:
'pci\ven_8086&dev_100f&subsys_075015ad&rev_01\4&b70f118&0&0888]'
After converting the first line from the setupapi.dev.log
USB entry, we then access the data
variable on line 35 to obtain the date information from the following line. We know the date value sits on the line after the device information. We can use the iteration count variable, i
, and add one to access that next line. Similarly to the device line parsing, we call the split()
function on the string "start"
and extract the second element from the split that represents the date. Before saving the value, we need to call strip()
to remove whitespaces on both ends of the string.
035 date = data[i+1].split('start')[1].strip()
This process removes any other characters besides the date:
Prior to processing:
'>>> Section start 2010/11/10 10:21:14.656'
Post processing:
'2010/11/10 10:21:14.656'
On line 36, we pass our extracted device_name
and date
values to the print_output()
function. This function is called repeatedly for any responsive lines found in the loop. After the loop completes, the code on line 38 executes, which closes the setupapi.dev.log
file that we initially opened, releasing the file from Python's use.
036 printOutput(device_name, date) 037 038 in_file.close()
Developing the printOutput() function
The printOutput()
function defined on line 41 allows us to control how the data is displayed to the user. The function requires two strings as input that represent the USB name and date as defined by the docstring. On line 49 and 50, we print the USB data using the format()
function. As discussed in Chapter 1, Now For Something Completely Different, this function replaces the curly brackets ({}
) with the data provided in the function call. A simple example like this doesn't show off the full power of the .format()
method. However, this function can allow us to perform complex string formatting with ease. After printing the input, execution returns to the called function where the script continues the next iteration of the loop, as follows:
041 def printOutput(usb_name, usb_date): 042 """ 043 Print the information discovered 044 :param usb_name: String USB Name to print 045 :param usb_date: String USB Date to print 046 :return: None 047 """ 048 049 print 'Device: {}'.format(usb_name) 050 print 'First Install: {}'.format(usb_date)
Running the script
We now have a script that takes a setupapi.dev.log
file, found on Windows 7 or higher, and outputs USB entries with their associated timestamps. The following screenshot shows how to execute the script with a sample setupapi.dev.log
file that has been provided in the code bundle. Your output may vary depending on the setupapi.dev.log
file you use the script on.
We modified the supplied setupapi.dev.log
so USB entries appeared at the top of the output for the screenshot. Our current iteration seems to generate some false positives by extracting "responsive" lines that do not pertain to just USB devices.