When you choose Targetprocess’ On-Demand (SaaS) solution, it is natural to have some questions about the safety of your data and application availability. What happens if my server breaks? Will I lose my data one day? We understand the importance of such concerns and would like to share some information about our infrastructure and the procedures we follow when something goes wrong.
This article is mainly focused on the high level infrastrastructure aspects, for information about data access, network security, etc. you are welcome to check our On-Demand Security Notes.
Here is a very simplified structure of the servers that are used for your On-Demand account:
For each of these components, there is most certainly a “Plan B” in the event of failure. Let’s have a look at them in more detail.
We have two proxy servers running; so, if one of the servers goes down, all traffic is redirected to the second proxy server and things should work just fine. At the moment we were writing this article, the possible downtime was up to 10 minutes. However, we are working to implement a switch without any noticeable downtime and have plans to add it very soon.
Currently the applications are distributed between about 15 servers. In case your application server goes down, we can switch you to a different server pretty fast. The application servers do not store your data; so even if it is lost forever, this doesn’t constitute a catastrophe.
It is a dedicated server responsible for the Live Updates feature; it was added for performance enhancement. These are 3 servers working together with load-balancing, each of them in different pods and even different data centers. If they go down, the main application won’t be affected, as they are responsible only for the live update functionality.
An optional component to the infrastructure, the REST server can be used for load balancing in order to achieve better performance. At the time this article was written, this server wasn’t used, but can be activated if required. If it goes down, it does not affect application or data availability.
This is where most of your data is stored and it is the highest priority for us. First of all, a copy of the main server is running in the background that is used for failover and continuous backups.The downtime for the main SQL server can be up to 20 minutes due to the large amount of data we host.
Apart from that, all data are backed up in the cloud in case both servers are lost. Actually, we store these backups in two clouds, not just one. They are even in different datacenters: one in the USA and another one in Europe. So, even if a meteor strikes one of the datacenters, we’ll still have your data.
We understand that in some cases you would prefer us to host your information in only one datacenter, perhaps due to legal regulations. If this is the case, please contact us and we’ll exclude your account from international replication or even setup a private cloud for you.
It is where your images and attachments are stored. Each datacenter has backup file servers that are stored in different pods in case of network or power outage. The replication happens every 20 minutes and the failover is automatic.
This is not a complete description of our infrastructure, which currently has more than 50 servers, but it gives a good overview of the infrastructure layout. Targetprocess infrastructure is always evolving, but rest assured that all changes we make are to improve the reliability of the system.
So, the answer to the question “What happens if … ” is “Most likely you won’t even notice anything”. Apart from the effort we invest in infrastructure, we also have advanced monitoring with more than 1800 sensors to prevent issues. The system is monitored by our Support and Infrastructure Teams and has different ways of automatically notifying us in case of something goes wrong.
While in our business it is not possible to predict absolutely all issues that may arise and guarantee 100% uptime, what we can guarantee is that your data will be safe with us and that we’ll do everything possible to prevent issues and minimize impact.