Cyber Security in Mission Critical Networks

In 1972 the original Ethernet technology was designed, implemented and tested by Robert Metcalfe and his team in Xerox’s Palo Alto Research Center as a way to share data amongst computers over a distributed communications network. Over the last 50 years, this technology has shaped the global society in ways that could scarcely have been dreamed of by these early pioneers. The birth of the Internet in the mid-eighties helped move us into a completely new age of understanding and information sharing. At its beginning the Ethernet and TCP/IP standards were not reliable or deterministic enough to be used in time sensitive applications, but over the years advancements in technology led to Ethernet being slowly considered for, and then adopted for, industrial control and monitoring applications. Having distributed communications networks covering entire industrial sites allowing for everything from monitoring to control provide invaluable at many different levels, and has led to Ethernet being accepted as the standard communications network for manufacturing, mining and even more recently extremely time sensitive applications like electrical grid monitoring and control in the IEC61850 suite of standards. 

However, with such a collection of technologies will of course come those who want to exploit the technology, and the companies and sites that use this technology. Over the past 10-15 years, cyber-security has become the buzzword and main focus when discussing communications networks, whether talking about the Internet, a private control and monitoring network for an industrial site or a government owned network encompassing a countries electrical grid, and for good reason as we see the number of attacks on all types of networks rising constantly. While at first these “attacks” were no more than a pimply teenager sitting in their parents basement trying to test their own abilities, more and more commonly in modern times we are seeing very serious examples of attacks that aim to steal, destabilize and cause massive damage or loss of life. In 2010 the Stuxnet virus was first discovered, and this was a massive wakeup call to the industrial community that was implementing industrial Ethernet for large portions of monitoring and certain control applications at the time. 

As with many technologies, the threats exist when using the technology, but this does not stop us from using it. Rather the focus changed when implementing such networks and the attached systems. While originally it was mostly about stability and reliability (for industrial networks) and about bandwidth and availability (for home and corporate environments), in both environments it instead now moved to securing the network as the first priority, with everything falling into place behind that primary driving goal. Whilst in an industrial setting reliability and availability of end services is high priority, we would rather lose some of that temporarily rather than allow it to be controlled by the wrong person.

Figure 1 – OSI Reference Model

Ethernet and TCP/IP networks and technologies are developed around the Open Systems Interconnect (OSI) reference model, the main goal of which is to provide a set of standards to be followed, whilst still allowing enough leeway within each of the layers to allow for innovation and adaptation, which is one of the main reasons for these protocols being accepted on a global scale throughout all economic sectors. The OSI model allows manufacturers to produce products at specific layers without necessarily having to worry about how data or hardware is going to work at other layers. For instance, a network interface card would work on the physical layer primarily, although generally these days it would also have microcontrollers and processors to be able to handle probably layers 2 and 3 as well through drivers etc. However for the engineer designing the physical RJ45 port, all they need to worry about is how the electrical signals received on the pins are converted to 1’s and 0’s to be passed up to the Data Link layer, and vice versa for the bits being received from the data link layer. As long as they follow the correct standards for interfacing to layer 2, it does not matter how they specifically design their circuitry to convert this into electrical signals (as long as those signals also follow the standards). And even more so, this engineer does not care about what is happening at the higher layers, i.e. what type of data or protocols will be using this RJ45 port they are designing. 

However this open systems approach to Ethernet is also one of its biggest unavoidable flaws. Allowing for this level of open interoperation between layers invariably leads to vulnerabilities being created, and these vulnerabilities are what most attacks on a network will try to exploit. For this reason we need to consider networking security from two perspectives. One is the big picture view, where we consider the network and the entire site/application it is being used for. This is important as our end goal is not so much to protect the network itself, but to protect the devices and systems that are using this network. A networking switch itself cannot control a busbar for instance, but a control relay attached to that switch might be able to. However from a practical approach we cannot simply just look at the big picture and come up with a single holistic plan to protect everything, and trying to do this will generally lead to incomplete protection as it becomes impossible to see all the trees through the forest. 

Instead we need to keep this big picture in the back of our minds, while we instead focus on individual smaller pictures and how to secure these individually. For instance, most people when leaving the house will lock their front door, get in the car and then open and drive out their property gate. Both the front door and the gate are part of the overall security, yet they are separate systems each with their own pros and cons. This is very similar in cyber security. As networks are designed around the 7 layered OSI reference model, this is a great structure to use for considering securing of a critical communications network.

One should start by looking at Layer 1, the physical layer. From a security point of view this can be split into two parts further. First we have physical access to the hardware itself, which is quite standard from a security point of view. We ensure that hardware is kept in an access controlled environment, either a control room that is generally 24/7 occupied, or locked in secure cabinets and enclosures. On top of this access to the entire site itself is generally controlled and secured by traditional security measures. 

The second part of layer 1 that we consider is the cabling connecting all the devices together. With traditional copper (Cat5e, Cat6 and Cat7) cable it is very possible to tap into the data running on the lines in some way or other using some physical device. This would allow an attacker to capture and eavesdrop on the traffic travelling on these lines, or to introduce some form of man-in-the-middle attack where they effectively pretend to be legitimate devices on the network but actually they are in charge of and able to change the data being transmitted, which is even more terrifying than them interrupting the data stream in some way. The gradual move to fibre optic cables, especially for backbones and long hauls, has definitely made this type of attack harder and less effective in most cases, but even with fibre it is still possible.

Part of the way of securing these cables is also physical access. When running within a site they can normally be relatively easily secured, especially if buried in underground trunking or run in high strung cable trays, but when running across public ground between sites we need to be more careful. Again physical access should be restricted and if possible monitored, but even here there will be locations that cannot be 100% protected. For this reason we also add extra protection at some of the other layers that will be discussed in more detail later, such as encrypting the data running over these lines, meaning that even if an attacker eavesdrops on this data they will not get much value out of it without having the correct keys to decrypt and understand it.

This brings us to layer 2 traffic, the heart of Ethernet LANs and critical to any control system. Layer 2 is where our traffic will belong while travelling within a local network, and as such is not generally as highly encrypted or secured as layer 3 and above (where we focus on security at the perimeters of our networks). Note that once again we will often still be encrypting the traffic using higher layer protocols such as SSH rather than Telnet for connecting to devices (SSH adds encryption to the traffic), but there are still ways in which we can protect data at layer 2. In Industrial networking this is most commonly handled by VLANs (Virtual Local Area Networks), which is an open standard way to segregate traffic on the network. In a corporate environment IP addressing and subnets are used to logically segregate traffic and are generally considered enough (although even now corporate networks are starting to implement VLANs as well as IP structuring for added backend security). While VLANs do not add encryption or any other direct cyber security protections, what they do allow is a segregation of traffic at layer 2 (and by functionality at layer 1). With VLANs we can statically configure switch ports to only belong to certain groups, which means we can completely stop traffic meant for another group from even being sent out that port, meaning the end device never sees the packet. With only IP structuring a device in one group (i.e. subnet) will still receive packets from another group, and then will inspect and throw those packets away. However using the correct exploits a hacker could still get an end device to process a packet meant for another IP subnet, even if just enough to be able to exploit some vulnerability in the way the packets are decoded.

VLANs also offer a host of other benefits, like congestion reduction within VLANs etc., as well as prioritization at a VLAN level. And while VLANs were only originally designed for network management (i.e. end devices were never originally planned to interact with VLANs), they have proven to be a very efficient way to start segregating traffic at the end device level, for instance with GOOSE traffic in IEC61850, where end device are starting to directly tag traffic for VLANs rather than just leaving this to the network. This has proven a very effective way to use an existing protocol that is supported across the Industrial Networking sector and expand it to provide benefit and functionality to end devices, while still allowing compatibility with existing networks. It is not hard to imagine more protocols in the future starting to use this existing functionality in creative and useful ways. In the writers experience in fact, VLANs are probably the most critical and least understood part of local industrial networking currently, and are often incorrectly designed or not utilized properly, meaning a huge beneficial feature of industrial networking is being sorely underused. 

We now move up to Layer 3, where IP addresses live and routing occurs. Routing is often used on industrial networks, especially within utilities where a system may span provinces/states or even countries. While each substation or individual site in generally a LAN (where data is further segregated by VLANs), we start to connect these sites together using WAN connections, which from a critical network point of view should always be considered at least a bit unsecure, unless you have full control of every LAN connected to the WAN as well as every WAN link and segment. Even if a WAN is privately owned by the same company that owns the site and a LAN, we should always expect and plan for possible attacks from the outside of the LAN. Note that in this context an attack is not always a premeditated or malicious event, it could even be traffic or triggers caused by other devices not being configured correctly, or even accidents from outside users (for instance connecting to and configuring the wrong device on a substation by accident). Basically we should always consider the possibility of incorrect or dangerous traffic from originating on the WAN, and protect against this. Commonly this is accomplished by an ACL (Access Control List), which these days is normally encompassed within a layer 3 and 4 firewall. An ACL allows us to determine details like what IP addresses from the outside are allowed to communicate with what addresses on the inside, and vice versa. Adding layer 4 firewall functionality on top of this (this is where the clear-cut breaks between the layers from a security point of view start to blur together) allows us to further define these rules to determine details like what protocols or TCP/UDP ports are allowed to communicate through the firewall (i.e. from inside to outside and vice versa). 

With a layer 3 & 4 firewall (or even encompassing the higher layers as well with Next Generation FireWalls (NGFWs)), we are able to have a high level of control over the traffic entering or leaving the networks. We can even start looking here at encryption tunnels such as IPSec or TLS tunneling, where we create a logical connection between two end points, and encrypt any traffic travelling over this tunnel. In this way we can use existing unsecure WANs (most notably the Internet) to connect logical sites, even when older devices on those sites force us to use less encrypted protocols. For instance a lot of older devices only support Telnet or HTTP, both of which send data as clear text across the network. This might be fine at the LAN level, but not if we need to send this data from the control centre to a device in the field using an unsecure WAN. In this case if we use an encryption tunnel, this traffic that is not natively encrypted will have a layer of encryption added before being sent across the internet, and this encryption can only be decrypted by the device on the far end of the tunnel.

So far we have looked at more passive security measures on a network, such as not allowing access from outside networks or encrypting and segregating the traffic between end points as much as possible. However in recent years even this view is starting to now evolve further, with us having to be more active and adaptive on the security measure applied to a network. For instance remote access is becoming more and more critical to these systems, which often require engineers from around the world to connect in to commission, maintain or troubleshoot. For this reason even though we are aware of the vulnerabilities of distributed WAN network, we have to find ways to still utilize them. As mentioned a common way these days is by using encryption tunnels, more commonly known as Virtual Private Networks (VPNs). These allow remote users to connect to the network using digital certificates, or various authentication options which then can determine what access they have to these networks. However this is itself opens up the option of attackers obtaining these VPN details using other methods, and gaining legitimate access to the network. Also the simple fact of interfacing with the internet opens up the path for outside attackers to passively and actively try and infiltrate the network, or for things like viruses and malware to be introduced. 

For this reason systems like Intrusion Detection Systems (IDSes) and Intrusion Prevention Systems (IPSes) are starting to be more commonly introduced and used. These advanced security systems are starting to use AI and neural networking to be able to understand and make decision automatically on a network. Both systems will use traffic capturing to analyse networks over time, and use this to create baselines and understanding about the network. They can then start to use these baselines to detect abnormal behavior and act on this in different ways. For instance say a backbone of the network normally uses between 2-3% on a certain section, except at around 13:00 each day when batch updates are sent to a database and utilization rises to 6-7% for 10 minutes. If the system suddenly sees this link jump up to 20% utilization this can be flagged as an anomaly and acted upon. 

Different systems will have different actions, but generally an IDS will Detect this anomaly and notify an administrative user, who can then manually action upon this info, such as checking the link for unusual activity (or possibly simply acknowledging and allowing the anomaly to continue because they are doing device updates on this day for instance). An IPS will often take a more active role (which is still defined by the administrative policies etc.), being able to for instance send a command to the Next Gen Firewall that is controlling this traffic, to tell it to block this new traffic that is anomalous. In our example above this would mean that the device upgrades the administrator is doing would be cutoff by the system and fail. This would then require the admin to go and undo this change and OK the data manually (they would of course be notified of the change by the IPS). However this small amount of hassle and annoyance is worth it if this same system would automatically be able to stop an attack from occurring, and this is where we come back to the point that security now out prioritizes even production, we would rather have a temporary production loss due to some system interfering with a new process on the system rather than allowing an attacker to influence this process without us having knowledge or control over this interference. 

While there are many different security protocols and systems, and new ones coming out every day, especially at the higher levels, we are normally concerned with up to layer 4 from network security point of view (although again this line is becoming blurred in modern times with the introduction of AI and Next Gen driven security), and so will stop the discussion at this point. The important take away is to realise that security requires both a holistic and a piece by piece approach, and to design and implement networks and the control systems that use them with this view in mind. It is also important to note that perfect security is generally not achievable, rather security is meant as a deterrent, we want to make it not worth an attacker’s time and effort to try breach the network. Once again similar with home security, no matter how high one builds a wall there is always a ladder than can be built to get over it, so as long as our wall is high enough that the ladder will cost the attacker more than they gain we can reset easy knowing that we are protected for now (until ladder technology progresses and costs drop of course, at which time we will need to start building that wall higher once again).