So after the last post on automation by CG I figured I'd change track a little and take a look at something that is of interest to me. Data Visualization. I've found it to be incredibly useful in not only tracking down anomalous behavior on a network but also for displaying metrics and data to the non technical folks. Visualizing data from a security perspective can show some interesting and valuable information.
Take NetFlow logs for example. While there is no payload information, if you're able to gather the logs from an entire network you can determine everything from infected hosts scanning your network to attacks against a web application. I've not actually done it but I'm sure there are ways to apply visualization to offsec as well. Maybe not in the actual pentest but possibly in the reporting/debriefing stage. It would be interesting to graph the path of compromise into the network with the attack vectors used and overlay that with the defensive measures [Patching policy, IDS/IPS, FW, etc...] in place in the target environment. It would be a really powerful way of showing how the network was compromised. If I get some time I'll try and gather some data and generate some visuals depicting that. I strongly recommend 'Applied Security Visualization' by Rafael Marty and 'Security Metrics' by Andrew Jaquith for information on data visualization and metrics for information security.
Lately I've been looking at the Waledac aka 'The New Stormworm' bot and it's fast flux capabilities. It's spreading via email spam messages. These e-mails contain different domains that are part of a fast flux network. Each of the domains has it's TTL set to 0 for the DNS record so virtually each time the domains are resolved a new IP address is returned. I took 82 known malicious domains associated with Waldac and using a simple perl script ran forward lookups against each domain gathering the ip and country information associated with that IP. I then stripped out any repeat addresses. I gathered approximately 2500 unique IP's in the short time I ran the script. Using Rafael Marty's Afterglow scripts and Graphviz's Neato I generated some simple link graphs of the source <> target pairs. In this case the source would be the domain and the target would be the fluxing IP address.
What's interesting about this graph is that it's immediately apparent that some domains are not fluxing and appear to be static. Perhaps this is intentional, perhaps they are no longer actively being used in the spam emails and are no longer part of the flux network or perhaps it's a misconfiguration on the authoritative DNS server for these domains. Unless you have access to Layer 7 traffic in an environment it's somewhat difficult to detect traffic to these domains using only NetFlow traffic but Emerging Threats and the folks over at the Shadowserver Organization have Snort sigs for detecting traffic from infected hosts.
As a curiosity I took the country location information for the IP addresses of the infected hosts and looked at the Top 10 Countries from that list. It's interesting to see that China and Korea have the largest percentage of infected hosts. This may be skewed by the time my script was running or it may be that the bot has been spreading longer in those countries. The above data, while interesting, is probably not that practical from an security admins perspecitive though.
Recently I needed to determine the potential number of hosts infected by the IE7 XML Parsing exploit. In order to do so I gathered NetFlow logs from the environment and next built up a list of known domains hosting the exploit. This list was gathered from multiple sources. Obviously this list is not comprehensive and likely there are numerous other domains hosting this and other exploits but it was a good start. Next I determined the IP addresses for these domains and using some Perl and the NetFlow traffic generated a list of unique source IP ranges with established communication to those malicious domains.
After generating the link graphs I was provided with a quick visual representation of which hosts had communicated with those domains. This created a quick list of potential infected hosts. I've since updated the graphs to show different colors for the subnets in the network, allowing me to determine areas that show greater infection rates. The next step is to determine the patching levels in those subnets with infections. This was done by looking at the SUS server logs. If the host was patched it could be assumed that it was not infected. Of course this assumes that the patch actually installed correctly too. :)
These malicious domains hosted more that one browser exploit though. Some of these domains had upwards of 20 unique exploits listed. Now we all know from using exploits for various 3rd party apps that these are seldom at the latest patch levels. So likley many of these stations were infected in some manner. The next step would be to determine what sites hosted what exploits and add an additional column in the csv file that contained this exploit information. Re graphing the traffic would show which hosts communicated to those sites. So if you knew that all manged/imaged workstations in that location had Adobe Acrobat 8.0 installed and some had visited a site hosting a pdf exploit you could determine that they were possibly infected. This does assume that you know what software is on those systems though. Waaay too many assumptions for my taste. The advantage of my graphing this data was that it had a very powerful visual impact that backed up my recommendation to establish and maintain block lists on the firewalls and to improve patching policies for third party apps. A very tangible dollar value could be applied to this as well. Each machine has a cost associated with it being infected and now needing to be rebuilt and patched before being allowed back into the network. Add to this the loss of employee productivity and you have a strong argument for blocking and improved patching controls.
There are other methods of doing the same thing but hey, I like looking at visual representations of the data and when dealing with huge data sets it can assist in picking out patterns that may hint at infections or other anomalous traffic.