Why Your Industrial Control System Is Broken

Configuration, Architecture, and Change

Have you ever walked into a control room, looked around at everything running… and thought, “this whole thing feels harder to manage than it should”?

Nothing’s completely failed. The equipment’s running. Alarms are coming in. But somehow… it just feels like a mess. Like everyone’s treading water.

If that sounds familiar, you’re not alone. And in 36 years working in industrial controls, I can tell you: that feeling is almost always telling you something real.

So today I want to discuss the three things that slowly turn a decent control system into one that’s a nightmare to live with. Configuration. Architecture. Change. These three things, on their own or working together, are behind most of the dysfunction I’ve seen in ICS environments.

Why Your Industrial Control System Is Broken

Understanding SCADA for Beginners

Why Your Industrial Control System Feels Broken: Configuration, Architecture, and Change | Alana Murray

What This Article Covers

What Broken Really Means

Why a system can be running and still be showing serious signs of dysfunction.

Configuration Drift

How inconsistent setup across assets makes troubleshooting slower and trust weaker.

Weak Architecture

Why systems that grew project by project often become harder to upgrade and support.

Unmanaged Change

How undocumented edits and poor change discipline erode even good systems over time.

How the Problems Reinforce Each Other

Why configuration, architecture, and change have to be managed together.

What You Can Do About It

Practical steps to regain control without tearing everything down and starting over.

Introduction

Have you ever walked into a control room, looked around at everything running, and thought, “this whole thing feels harder to manage than it should”?

Nothing has completely failed. The equipment is running. Alarms are coming in. Operators are getting through the shift. But somehow, the whole system feels heavier than it should. Troubleshooting takes too long. Every change feels risky. People rely on memory more than documentation. Everyone is treading water.

After 36 years in industrial controls, I can tell you that feeling is almost always telling you something real. Most broken control systems do not become broken overnight. They become broken slowly through three things: configuration, architecture, and change. On their own, each one can make a system harder to live with. Together, they are behind most of the dysfunction I have seen in ICS and OT environments.

Broken Does Not Always Mean Down

When people hear “broken system,” they usually picture the obvious version. Lights out. Equipment stopped. Production halted. A major alarm on the screen. In industrial control systems, that is often the easy kind of broken, because at least everyone knows something is wrong.

The harder kind of broken is quieter. The system is technically operating, but it is difficult to troubleshoot. Similar areas behave differently for no obvious reason. Operators do not trust the alarms. Engineers are afraid to touch old logic. Drawings are questionable. Only one or two experienced people really understand how the pieces fit together.

Field Test: Is the System Already Showing Signs of Being Broken?

If every change feels dangerous, every problem takes too long to solve, and only one or two people really understand the system, then the system is already showing signs of dysfunction. It may still be running, but it is not healthy.

This matters because industrial environments depend on more than uptime. They depend on maintainability, repeatability, and trust. A system that cannot be safely changed, clearly understood, or reliably supported is a business risk and, in many industries, a safety risk.

1. Configuration Drift Makes Everything Harder to Trust

A lot of systems are not suffering because of bad hardware. They are suffering because of poor and inconsistent configuration. Those are two very different problems.

Think about ten similar pumps across a facility. Same manufacturer. Same model. Five were set up by one integrator five years ago. The other five were set up by a different contractor two years later. Different tag naming. Different alarm limits. Different HMI faceplate behavior. Some bypasses were added during startup and never removed.

Now multiply that across 10 or 15 years of projects, contractors, programmers, and expansions. What you get is a system where nothing is quite predictable. Troubleshooting slows down because the technician cannot assume that similar assets behave the same way. Everything has to be verified from scratch.

Real-World Pattern: “This Area Works Differently”

I have been in facilities where a senior technician would point at a section of the plant and say, “That area was done by a different integrator, so it works differently than the rest.” That knowledge should not live only in someone’s head. Once the person who knows the exceptions leaves, the site loses the map.

Configuration drift is not just PLC programming style. It includes alarm limits that no longer match how operations actually runs, user accounts and permissions nobody has reviewed in years, HMI screens that look different from building to building, and inconsistent field device settings across equipment that should be identical.

What Good Looks Like

Start with a configuration baseline. Define naming conventions, alarm practices, HMI faceplate behavior, security roles, device settings, and backup expectations. Even if the old system is messy, you can stop creating new inconsistency starting with the next project.

2. Weak Architecture Turns Local Fixes Into Long-Term Complexity

Even good configuration cannot fully save a system with weak architecture underneath it. Configuration and architecture go hand in hand.

Here is the thing: many facilities did not start with bad architecture. They evolved into it. Project by project. Budget cycle by budget cycle. Expansion by expansion. A new vendor came in. A new platform solved a local problem. A temporary integration became permanent. A reporting tool got added because the historian could not quite do what the team needed.

At the time, each decision made sense locally. It solved a specific problem and got the project done. But over time, the facility ends up with too many platforms, too many vendors, poor separation between layers, and duplicated functions split between SCADA, historians, PLCs, and third-party tools.

Healthy Architecture	Patchwork Architecture
Clear separation between control, supervisory, historian, reporting, and enterprise layers	Functions duplicated across multiple platforms with unclear ownership
Standard patterns for new equipment, integrations, and remote access	Every project adds its own pattern, exception, or tool
Network segmentation and data flows are documented and intentional	OT and IT connections grew over time without a clear design intent
Upgrade paths are understood before lifecycle pressure becomes urgent	Replacement projects get delayed because nobody fully understands the existing system

Architecture problems usually do not hurt right away. That is what makes them dangerous. They show up later as complexity, brittleness, rising lifecycle costs, and upgrade projects that take far longer than expected because the existing system is so tangled.

Watch Out For: Architecture Without a Target

If nobody can describe the target architecture, every project decision becomes a local decision. Local decisions are not automatically wrong, but without a long-term direction, they can pull the facility further away from a maintainable system.

3. Unmanaged Change Erodes Good Systems

This is where a lot of good systems start to fall apart. Change is constant in industrial environments. Equipment gets upgraded. Processes shift. New reporting requirements come in. Security expectations change. You are never going to stop change from happening, and you should not try to.

The problem is not change itself. The problem is uncontrolled, undocumented, poorly governed change.

Emergency edits get made directly in production under pressure. Nobody has time to write them up, so they do not get documented. Drawings never get updated. PLC logic is modified, but backed up incorrectly. Contractors come in, do the work their way, and leave. Your team does it another way. Neither approach gets captured clearly enough for the next person.

The Line That Matters

Most broken systems are not systems that were designed badly once. They are systems that were changed hundreds of times without enough discipline around each change.

This is why management of change has to be more than a compliance form. It needs to be a real operating discipline. Changes should be reviewed, tested, documented, backed up, and reflected in drawings and records as part of the work, not as something everyone hopes to clean up later.

Configuration, Architecture, and Change Reinforce Each Other

These three problems do not stay in their own lanes. They feed each other.

Weak architecture makes configuration harder to standardize. If the facility has fragmented platforms and no clear design intent, there is no solid foundation for consistency. Configuration drift almost becomes inevitable.

Poor configuration makes every future change riskier. If you cannot trust the baseline, every modification becomes an investigation. You are not just making a change, you are trying to figure out what the current system really is.

Unmanaged change slowly erodes both configuration and architecture. Every undocumented edit makes the baseline less trustworthy. Every project that bolts something on without checking the architecture makes the overall design a little weaker.

How Control System Drift Builds Over Time

1. Local Fix

A change solves an immediate problem but is not fully documented.

2. New Exception

The system now behaves differently in one area than another.

3. Tribal Knowledge

People learn the workaround, but the official record does not change.

4. Higher Risk

The next change takes longer and carries more uncertainty.

5. Lost Confidence

Operators and engineers stop trusting the system as a reliable source of truth.

Key Point

You cannot fix one of these problems and ignore the other two. Configuration, architecture, and change have to be managed together because they are connected in the real system.

What This Looks Like in a Real Facility

When drift has taken hold, the symptoms are usually familiar. Operators do not trust the alarms because there are too many nuisance alarms or alarms that have been wrong for too long. Maintenance staff work around the system instead of with it. Engineers avoid old code because nobody knows what it does.

Similar assets behave differently for no obvious reason. Drawings are questionable. Upgrades take twice as long as they should because the first two weeks are spent figuring out what is actually there. Troubleshooting depends on one veteran person who carries the whole system in their head.

Common Warning Signs

Alarm Fatigue

Operators ignore alarms because too many are nuisance, stale, or poorly prioritized.

Unsafe Reliance on Memory

Critical knowledge lives in one or two people’s heads instead of drawings, standards, and records.

Fear of Old Code

Engineers avoid parts of the logic because nobody understands the consequences of touching it.

Project Drag

Every upgrade starts with a discovery project because the actual state of the system is unclear.

Field Insight

When a system becomes dependent on memory instead of standards, drift has already taken hold. That does not mean the facility is doomed. It does mean the team needs to start rebuilding control over the lifecycle.

What You Can Actually Do About It

Let me be clear: I am not talking about tearing everything down and starting over. That is rarely practical, and it is often not necessary. The real goal is to regain control of the system lifecycle gradually and deliberately.

Four Disciplines That Stop the Drift

Standardize Configurations

Define naming conventions, alarm philosophy, HMI behavior, device settings, security roles, and backup expectations. Use them going forward.

Define a Target Architecture

Even if you cannot get there tomorrow, know where the system is supposed to go. Check every project decision against that direction.

Use Real Management of Change

Review, test, document, back up, and update drawings as part of the change. Do not leave the recordkeeping for later.

Manage Technical Debt

Treat lifecycle issues, undocumented exceptions, and fragile integrations as active risks, not just old problems waiting to fail.

One of the best questions you can ask during any project review is simple: does this decision move us toward our target architecture, or further away from it? That question alone changes the conversation. It forces the team to think beyond the immediate project and into the long-term health of the control system.

Practical Starting Point

Pick one system area or one equipment class and baseline it. Document the current configuration, compare similar assets, identify differences, and decide which differences are intentional. You do not have to fix the entire plant at once. Start where inconsistency is causing the most pain.

The Bottom Line

Industrial control systems usually do not become broken because of one device, one bug, or one bad decision. They become broken gradually through configuration drift, weak or fragmented architecture, and unmanaged change.

If your system feels harder to maintain than it should, ask yourself where the real issue is. Is it configuration? Is it architecture? Is it change? Or, as is often the case, is it all three reinforcing each other?

The good news is that you do not have to fix everything at once. Start by regaining control of the lifecycle. Standardize what you can. Define where the architecture is heading. Make change control real. Treat technical debt as something you manage before it turns into failure.

Your Next Step

Pick one painful area of the system and ask three questions: is the configuration consistent, does this fit our target architecture, and is the current state properly documented? Those three questions will tell you more than most long meetings ever will.

Frequently Asked Questions

Q: How do I know whether the main problem is configuration, architecture, or change?

Field Guidance

Start by looking at symptoms. If similar assets behave differently, look at configuration. If upgrades and integrations are consistently painful, look at architecture. If nobody trusts drawings, backups, or current logic versions, look at change control. In many real facilities, the honest answer is all three.

Q: Do we need a major modernization project to fix this?

Field Guidance

Not necessarily. Modernization may be needed in some cases, but many facilities can make meaningful progress by standardizing new work, cleaning up high-risk configuration drift, improving documentation, and applying better management of change. The goal is to stop adding new mess while you reduce the old mess over time.

Q: What is the first thing I should document?

Practical Approach

Start with what people depend on during troubleshooting: current PLC backups, I/O lists, network drawings, alarm philosophy, critical setpoints, and a simple control narrative for each major process area. If your team regularly says, “only one person knows that,” document that area first.

Q: How do we prevent contractors from adding more inconsistency?

Field Guidance

Give contractors your standards before the work starts. Include naming conventions, HMI expectations, alarm practices, backup requirements, cybersecurity expectations, and documentation deliverables in the scope. Then review against those standards before accepting the work.

Q: What if the system is too old to fully standardize?

Honest Answer

You may not be able to make the old system perfect, and that is okay. The priority is to identify the highest-risk inconsistencies, document known exceptions, and stop introducing new ones. A messy legacy system with a known map is far safer than a messy legacy system nobody understands.

Q: How often should we review technical debt in an ICS environment?

Field Guidance

At minimum, review it during annual planning and before every major project. Better yet, keep a living technical debt register that includes aging hardware, unsupported software, undocumented integrations, fragile code areas, alarm problems, and known configuration inconsistencies. Treat it like operational risk, because that is what it is.

Q: How does cybersecurity fit into this conversation?

Security Reality

Poor architecture, inconsistent configuration, and unmanaged change all make cybersecurity harder. You cannot protect what you do not understand. Good segmentation, current inventories, documented remote access, reviewed accounts, and controlled changes are not just engineering hygiene, they are security foundations.

Resources and Further Reading

OT Security and Architecture Guidance

Standards and Good Practice

Practical Internal Documents to Maintain

Control system architecture diagram, including OT, IT, remote access, historian, and reporting paths
Configuration standards for tags, alarms, HMI objects, security roles, and device settings
Management of change procedure tied to backups, testing, drawings, and handoff documentation
Technical debt register covering obsolete assets, unsupported software, fragile integrations, and undocumented exceptions

Professional Disclaimer

The information provided in this article represents general engineering principles and field experience accumulated over 36 years in industrial automation. This content is intended for educational and informational purposes only and should not be considered site-specific engineering, safety, cybersecurity, or regulatory advice.

Every industrial facility has unique operational, safety, environmental, cybersecurity, and regulatory requirements that must be evaluated by qualified professionals familiar with the specific systems and local codes. Always follow applicable standards, site procedures, vendor guidance, and formal engineering review before implementing changes in production environments.

The author and publisher disclaim any liability for damages, losses, or injuries that may result from the use or misuse of information contained in this article.

What This Article Covers

Introduction

Broken Does Not Always Mean Down

Field Test: Is the System Already Showing Signs of Being Broken?

1. Configuration Drift Makes Everything Harder to Trust

Real-World Pattern: “This Area Works Differently”

What Good Looks Like

2. Weak Architecture Turns Local Fixes Into Long-Term Complexity

Watch Out For: Architecture Without a Target

3. Unmanaged Change Erodes Good Systems

The Line That Matters

Configuration, Architecture, and Change Reinforce Each Other

How Control System Drift Builds Over Time

1. Local Fix

2. New Exception

3. Tribal Knowledge

4. Higher Risk

5. Lost Confidence

Key Point

What This Looks Like in a Real Facility

Common Warning Signs

Alarm Fatigue

Unsafe Reliance on Memory

Fear of Old Code

Project Drag

Field Insight

What You Can Actually Do About It

Four Disciplines That Stop the Drift

Standardize Configurations

Define a Target Architecture

Use Real Management of Change

Manage Technical Debt

Practical Starting Point

The Bottom Line

Your Next Step

Frequently Asked Questions

Q: How do I know whether the main problem is configuration, architecture, or change?

Field Guidance

Q: Do we need a major modernization project to fix this?

Field Guidance

Q: What is the first thing I should document?

Practical Approach

Q: How do we prevent contractors from adding more inconsistency?

Field Guidance

Q: What if the system is too old to fully standardize?

Honest Answer

Q: How often should we review technical debt in an ICS environment?

Field Guidance

Q: How does cybersecurity fit into this conversation?

Security Reality

Resources and Further Reading

OT Security and Architecture Guidance

Standards and Good Practice

Practical Internal Documents to Maintain

Professional Disclaimer

Alana Murray

Related Posts

Trending now