Cloud solutions have long been hailed as the key to boosting competitiveness. In many ways, this is true—you can’t scale and go global with the same offerings and services you have on-premises. By leveraging a public cloud provider, you can quickly spin up an e-commerce site, scale it to handle traffic, and reduce resources as efficiently.
On paper, this sounds simple, but in practice, it requires redesigning your platform’s existing apps and services. This is often a time-consuming and resource-intensive process. For many companies, a “lift and shift” approach—moving their current infrastructure to the cloud without redesigning it—is more attractive than starting from scratch. You must redesign your backup plan, disaster recovery, and security settings and maintain least privilege access, among other things.
However, I advise avoiding the “lift and shift” approach. While it may seem more manageable, it often leads to rising costs and frustrated engineers who don’t have the time to learn new technologies or methods.
It’s easy to say, “This year, we’re moving to the cloud,” I believe it’s a great move. But unless your team is already proficient, it may not be as straightforward as it seems. Transitioning to the cloud takes years of dedication, testing, learning from failure, and documenting existing processes in a way everyone can understand. Automation only works when you know your processes inside out—not vice versa.
So, what can you do as a business owner?
At the Team Level
First, don’t be convinced that everything in the cloud will solve all your operational problems. It’s tempting, but don’t fall for it. Were you saving money? It all depends on how you utilise cloud resources. Costs will increase if you replicate your on-prem virtual machines to the cloud. Specific machines can cost £3,000 a month or more, and your monthly costs could easily reach tens of thousands—or even hundreds of thousands—if you’re not careful.
Instead, consider investing a fraction of that money into upskilling your team so they can truly learn how to work in the cloud.
Be Transparent
Check-in with your team to understand their feelings about adopting cloud or SaaS technologies. Most will be resistant, but if you include select individuals and help them feel like they’re part of the journey, you’ll likely have a better chance of success.
Remember, no platform is flawless, no matter how well-intentioned or expertly built. As you transition to the cloud, the flaws and inefficiencies in your platform will likely become more apparent—and this will have operational implications beyond your platform itself.
Operational Efficiency
You need to know which servers and services can be shut down outside business hours. Tag them and create scripts to automate shutting them down and waking them up at the correct times. Experiment with reducing instance types to lower specs and see how they perform. Sometimes, the team responsible for a service needs to address inefficiencies within the service itself before scaling it vertically.
On-prem, you’d need to justify and acquire specific types of services. The same principle applies in the cloud. It’s easy to scale a Windows virtual machine vertically and let it run all day at £5,000 per month—especially if logging is enabled, adding another cost layer.
Empower Your Team to Research Alternatives
Encourage your team to explore better ways of optimising your cloud infrastructure. Allow half the team to focus on this while the other half continues business-as-usual tasks and ongoing projects. Reduce the number of projects—many are just “tick-box” exercises that don’t add value to your business. Stay lean and focused on what matters most.
The strategy of doing more with fewer people works only if there’s a solid level of automation, well-understood prioritisation, and effective communication channels.
These steps allow you to optimise your cloud infrastructure, reduce unnecessary costs, and build a more agile, efficient platform that supports sustainable growth.
See Things as They Are
Don’t hide behind reports and endless meetings. I sometimes suggest to business owners that they imagine they are stuck in a traffic jam and question why it’s happening. Is it because of rush hour, too many people, poor road design, or a lack of alternative routes? Now, imagine your company in the same situation. When you open the doors and customers rush in, if your platform isn’t prepared, everything will stall—services will likely break down, customers will complain, and the IT team will have to work overtime to fix the mess.
If your processes aren’t inefficient, a traffic jam will occur, and time—the very thing you “buy” from your employees—will be wasted in traffic because they’ll be stuck there with you.
Optimise Your Infrastructure and Reduce Costs
This is no easy feat, and hopefully, your team will be able to assist. If not, select what they can do now and do it well.
Things You Can Plan or Delegate:
- Evaluate Your Current Infrastructure
- You need to know what is powering your business, give them meaningful names to understand them better, catalogue them, and build a dashboard showing what is up and down. Use colouring if necessary.
- Ask your team to determine which resources are not utilised, which are under-spec, or over-spec.
- Use cost applications to show where you’re spending. Most cloud vendors possess this functionality; they are not hard to learn.
- Keep your platform simple, easy to understand, and maximised to bring value. I have seen many IT teams exist just for the sake of ticking boxes and demanding others endure heavy meetings and endless complaints.
- Decouple Your Services
- Time and again, I still see databases tied to the servers that maintain them. I advise keeping them separated from the VMs and putting them in storage or, better yet, a managed SQL SaaS. This way, when upgrading the server, you can have two lower-spec servers and failover to another when one requires patching and vice versa. Simplify your life. Sometimes, the fractional speed improvement doesn’t justify the headaches of maintaining the whole server, especially when it breaks down.
- Adopt a Mindset of Data Over Runtime
- Your data is essential—that’s rule number one. Servers and services that compute business logic are nothing without data. Adopt a mindset where a server or service can end anytime, knowing that another will replace it quickly.
- Right-Size Your Resources
- Adjust Instance Sizes: Avoid over-provisioning resources. For example, scale down if your workload doesn’t require the highest-tier virtual machine to save costs.
- Use Autoscaling: Implement autoscaling for compute resources, which allows your cloud infrastructure to adjust dynamically based on traffic or demand, reducing the need for always-on resources.
- Leverage Reserved Instances or Savings Plans
- Reserved Instances: Commit to using specific instances for a more extended period (1 or 3 years) in exchange for lower rates. This can lead to significant savings if your usage is predictable.
- Savings Plans: If you’re using AWS, Azure, or Google Cloud, consider using savings plans, which allow for flexible usage across different instance types while still offering a discount.
- Optimise Storage Costs
- Use Lower-Tier Storage: Use lower-cost storage options like cold or archive storage for less frequently accessed data.
- Implement Data Lifecycle Policies: Set up policies to automatically transition older data to cheaper storage tiers or delete unnecessary data. Regularly clean up old backups and logs.
- Automate Resource Management
- Automated Shutdowns: Automatically shut down non-essential services during off-peak hours. Use scheduling tools to turn off servers, databases, and other resources when not in use.
- Tagging and Scheduling: Tag resources by environment (e.g., production, development, testing) to make it easier to manage and allocate costs. Schedule resources to automatically scale or shut down when no longer needed.
- Monitor and Optimise Performance
- Cloud Monitoring: Use monitoring tools to track resource usage and performance. Tools like Azure Monitor, AWS CloudWatch, and Google Stackdriver help identify bottlenecks, over-provisioned resources, and areas where performance can be improved without adding cost.
- Load Balancing: Implement load balancing to ensure that no single server or instance is overburdened, leading to inefficiencies. Distribute traffic efficiently across resources to optimise performance and reduce unnecessary overhead.
- Optimise Network Costs
- Minimise Data Transfer Fees: Data transfer costs can quickly increase, especially when moving data between regions or outside the cloud. Minimise cross-region data transfers by placing resources closer to where they are needed or using content delivery networks (CDNs).
- Use Private Connectivity: If your cloud provider offers direct connectivity options (e.g., AWS Direct Connect, Azure ExpressRoute), consider using them to reduce data transfer costs between your on-premises infrastructure and the cloud.
- Focus on Automation and DevOps Practices
This is an advanced option since it requires much maturity to be deployed and adopted. Don’t fall into the trap of “we put this in containers, deploy over IaC, and all is well.”
- Automated Deployment: Use Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or ARM Templates to automate the provisioning and management of cloud resources. This ensures resources are only provisioned when necessary and avoids manual errors that may lead to inefficiencies.
- I recommend telling your team to learn Ansible, allowing them to deploy and patch your platform consistently.
- Continuous Monitoring and Improvement: Build feedback loops into your DevOps pipeline, with constant monitoring to ensure resources are optimised and unnecessary services are removed. Regularly review your architecture to identify cost-saving opportunities.
- Adopt Serverless or Containerized Architectures
I would suggest learning how your service platform works. Monolithic is still valid, so use the suggestion below with caution. This is where vendor locking happens. And if your engineering team dismiss this or tells you that they are not in the mood to redesign or fix things in the current platform, you have a problem.
- Serverless Computing: Explore serverless options for specific workloads, such as AWS Lambda, Azure Functions, or Google Cloud Functions, where you pay only for the actual compute time rather than idle time.
- Containers and Kubernetes: Implement containers and Kubernetes for efficient scaling and management of applications. By using containerised environments, you can optimise resource utilisation and avoid over-provisioning.
Adopt a “bazaar” mentality with your platform. When there are no customers, shut down services that are not required to be available. Create an inventory to identify what needs to be online overnight and what doesn’t.