We feel confident when our security controls have handled something – they alerted and confirmed that they quarantined – so we must have backed the right product, but how do you really validate that? Should we believe that the control saw everything and handled the entire problem?
Attackers are cunning; we repeatedly see the use of distraction techniques to lure responders away from the mission’s real objective. The way to validate the signals from your controls is to cross-check them with telemetry from the estate. Many of the IOCs that we follow in Incident Response are just below the surface and are relatively simple to uncover with standard admin tools. This valuable telemetry data is readily available to all organisations, and can be collected to create a no/low-cost data lake to question and gain insight into key risk indicators to drive your cyber operational workflows.
What Do I Need to Know?
With a continuous stream of alerts entering your SIEM and IT operations service desk, you can very quickly become overrun by data. It’s the combination of knowing the questions you want to ask and having a good view on what your estate looks like (i.e., identifying your critical assets; and who your admins are) – that you can begin correlating the two and start drawing out information to get the most from your readily available telemetry.
There are many aspects of security telemetry that are interesting to collect and analyse; but, if you are doing this on your own or with a small team, it’s necessary to approach the problem with the following questions to find anomalous activity that you can further scrutinise:
- Are my security tools running everywhere?
- What time of day are people logging in?
- What user profiles exist on machines?
- What changes in privileges are occurring?
- Can I detect changes in my software baseline?
However, this is not enough; you must have an understanding of your estate to be able to use the telemetry information. If you cannot correlate the two sets of data, you will be quickly overrun with IOCs. Make sure that you build up a sound understanding of the following:
- Who has admin privileges in my estate?
- What are my standard admin tools?
- How many machines do we have?
- What is the make-up of those machines? EUC, Server, Laptop, Tablet etc.
- Which compute assets are critical to the operation of the business?
- Which users represent the most risk?
- What is the approved software inventory?
- What does my software footprint look like from a network perspective?
When time is scarce, but you have concerns over the security of certain users/devices in your estate, you can record information from the Windows System Information application about installed software and software persistence. You can also monitor which user accounts are active on specific hosts. As long as you know what you are comparing to – these telemetry vectors can be a strong indicator of compromise.
The software you find in the System Information output should be everything you have included in your build, and you should not expect any deviation from that. If there is, then that could indicate that the logged-in user has too much privilege and can add new programs, which translates into the risk that an attacker could do the same.
Also related to software are persistence mechanisms. A standard machine will likely have no scheduled tasks; attackers like to add such tasks to get their implanted malicious software to run and survive a reboot. Keep an eye on what tasks are appearing – anything with a strange name, or using a strange credential is questionable.
Monitoring users can be as simple as a visual inspection of the profile folders on hosts that match your critical asset list. On your servers, you are only expecting your sanctioned admin users; on EUC devices, you are only expecting the primary user. If you see other profiles, it could indicate that others are logging onto these assets; it’s important to pay attention to the folders’ timestamp and contents. OOH connections have also become more interesting. Having a weekly regime to check these two vectors on your critical assets could form an adequate basic sanity check.
If you have more time, or perhaps Security focused members of your operational team, you can afford to delve deeper and try to automate data collection. An admin with the confidence of using command-line tools can extract information from one or many devices. PowerShell, Python and Windows CMD.exe, can be very useful. There is also the Windows Event log which has a wealth of information and a useful filter utility. Be warned though; not everything creates a log event by default – you will need to do some reading to turn things on.
Using the CLI tools, it’s possible to build multiple queries into a script; each query can output the information into a CSV format. By collecting the same set of data from your critical assets on schedule, you can start to build your data lake to compare results from one asset to another, or even from one time period to another. In addition, there are open-source tools that can help you visualise the collected data and provide a basic interface to perform queries. NCSC refers to LME (Logging Made Easy) as a DIY approach to this.
Standard Admin Tools available to gather telemetry include:
- DNS: Audit and analytical logs found within your DNS, can identify the scale of infection, repeat infection and OOH configuration changes.
- Local Firewall: Your firewalls local event logs can help you identify OOH remote sessions and policy changes.
- Netstat: Periodically checking your netstat data provides you with a list of open connections and associated services, in which you can look for peer to peer connections, external connections, and risky ports.
- Registry: Within your HKLM, HKCU and run keys, you should pull out host fingerprint information to contextualise your findings, but also check the registry for software persistence, patching information and local users and groups.
- Prefetch: Your application caches and binaries, DLL and SYS files, can help you uncover whether evidence files have executed, if there’s out of place binaries, and binaries with random names.
- Services and Scheduled Tasks: If you look within services MSC, you will see a full list of your services. If there’s anything rogue within, then stop and question it.
- User Profiles: In addition to the inspection users in the registry its useful to check Windows Explorer for the existence of user profile folders as they provide an insight into potential lateral movement. If we validate the folders’ timestamps and their content, we also get an idea of when the users are logging in.
Making the most out of your readily available telemetry allows you to quickly identify anomalies that require a follow-up, as well as, gain insight to drive your cyber operational workflows. If you want help with how you manage this at scale, please reach out to us.
Explore what zero trust really is from both an analyst and common view, we’ll look at common approaches, including practical steps you can take now to start the journey, and share our views on the operational side of it all.
A vulnerability found in Pulse Secure VPN appliances has been exploited impacting Government and Financial organisations. Read our advisory.
The NCSCs 8 Zero Trust principles help you focus your efforts when establishing a zero trust architecture. How many can you tick off?