DrugBank is a curated pharmaceutical knowledge base for precision medicine, electronic health records, and drug development. We collect, curate and organize all the important information about drugs into a single unified resource. We also offer a free and ad-free website drugbank.com that is used by clinicians, nurses, pharmacists, researchers, and students around the world.
We do a lot of heavy data processing as well as maintain a website with millions of users. As a data company, we need plenty of storage for our stack including graph database, caching, indexing, and a relational database.
We’ve found UpCloud to be much more price and performance competitive than traditional cloud providers. Previously, we used dedicated hosting in order to get a large amount of memory and compute power, but UpCloud offers a perfect middle ground between fast dedicated servers, and the normally slow and throttled cloud services. Additionally, we experienced a ton of network issues in our cluster with our previous host. The network on UpCloud has been rock solid.
Our main goal was to get a highly stable and scalable infrastructure at a reasonable price. We looked at other solutions such as AWS, Google Compute, Digital Ocean, and OVH and found none that could compete. The storage performance on UpCloud is as close to dedicated servers as you can get.
Hosting on UpCloud
We have customers around the world using our API’s and website. Because our site is very data-heavy, scaling page views with many queries and joins requires a strong combination of I/O performance, solid networking for caching and database reads, excellent redundancy, and the ability to deploy across data centres to ensure fast access. Additionally, we do a ton of data crunching in our import pipeline, which runs nightly, grabbing and integrating drug information sources from around the world. For this, we need lots of cores and memory, with easy expandability as we add new data sources and algorithms. We also have a strong focus in machine learning, specifically NLP, and training our models has been working well on the high-performance solutions offered by UpCloud.
Our cloud infrastructure includes 3 primary database servers arranged in a cluster, hosting Neo4j, MariaDB, and ElasticSearch. Another database server is used for storing, searching, and computing properties on chemical structures. We also employ a few Redis servers and web servers that host drugbank.ca, drugbankplus.com, API, docs, etc. These connect to the database cluster and Redis and are using Ruby on Rails.
In addition, we have worker servers that run the data pipeline using a Sidekiq queue and we provision using a Saltstack server. Backup servers manage secondary off-site backups to S3 while we also use the UpCloud backups. On top of all this, we’ve set up replication of all of the services in our staging and QA environment.
We use MaxIOPS on all servers, we use backup snapshots, and we use the firewall service to configure the firewall for all servers. The private network is used for all communication between servers. We don’t yet use the UpCloud API but intend to in future. We use load balancing via CloudFlare to our UpCloud web servers. We also host our staging/QA environment on UpCloud.
The time we spend monitoring, fixing, and dealing with outages has gone down to almost nil. Our previous host had terrible customer service and we were always having networking issues. The customer service with UpCloud has been exemplary and has inspired us in how we serve our own customers.
Our website is now much faster after migrating to UpCloud than on our previous host as reported by Google page speed insights. We believe the reliability and speed of the private network are partly responsible, as well as the faster storage offered by MaxIOPS.
We have no issues with the reliability, consistency, or security of UpCloud services. They have responded rapidly to known exploits and have responded quickly and thoroughly with any customer support issues we have faced.
We are expanding our use of machine learning and always growing our datasets. We have had a growth of over 1200% in terms of customers over the past 3 years, so we will need to continue to scale to new locations and greater redundancy. We plan on making use of the UpCloud API as well as the newly launched networking features using the SDN architecture.
As for any new features, we would like to see UpCloud offer GPU servers built for machine learning applications.