How we built our own in-house data center: part 1
Six months ago, after years of upgrading our server capacity in the cloud, we finally decided to build our very own development (secondary) data farm – including an array of specialized AI processing hardware – completely from scratch.
It wasn’t easy. Along the way, we faced many challenges, including finding the right supplier, securing a perfect physical location, mounting and installing the servers, testing every component and getting everything fully synchronized.
We expected these challenges going in but the payoff has been even greater than we expected! Our data center now supports all our local operations in Turin, as well as some back-up and failover from the main production system.
Here’s part one of our exclusive, inside look at the step-by-step process of building our very first data center.
Why take on this massive project?
Like all machine learning companies, we live and breathe data.
As the volume of data we process continues to grow, so does our storage and processing needs. While we could have gone the route of keeping our data and processing in the cloud, we realized that to get everything we wanted out of a server, building our own was the best option.
Our four core needs:
1. Security
Priority number one: security. Every day, we deal with large confidential datasets and our clients expect the privacy of their data to be maintained at all costs. To enhance the security of our clients’ information, and to provide total transparency, we decided we needed to own the physical location and equipment where that data would be stored and encrypted.
2. Speed
When you run millions of processes per second (as we do), you’re not only worried about the server’s processing power, but also the bandwidth of the connection. When sending and receiving a terabyte of data to a server on the other side of the world, latency becomes a serious concern and the only way to eliminate that issue is to handle this type of processing locally.
3. Availability
When you’re handling your processing in-house, fixing a server that goes down is as simple as running down the hall. And if you can’t get a connection back up immediately, in a worst-case scenario, you can always grab an external drive and mount it to another server. No delays or issues from relying on a remote service provider.
4. Cost
To get a virtual environment in the cloud of comparable processing power would cost us 10 times as much as owning our own in-house servers. What's more, owning the physical hardware enables the perfect balance of performance, storage and security, enabling us to optimize all those areas in a single stroke, while quickly amortizing our initial investment. It was basically a no-brainer.
How did we approach the planning process?
Although modern servers are relatively straightforward in terms of installation, we needed quite a bit of preparation to get all our ducks in a row.
The first and biggest question to answer was, “How much computing power do we need?” To find out, we spent weeks talking with our data scientists, as well as our client team, to project the number of clients we’d be onboarding over the coming year and the data requirements to service them.
Based on our previous experience with a pure cloud environment where cores, memory and I/O capacity are the key factors, we estimated to need the following:
- 240 cores: data processing is very CPU intensive (especially db operations)!
- 2TB of RAM: not just a lot of CPU, but also RAM to support the computation in-memory
- 100TB+ of disk space with different I/O capabilities from SSD and SAS to SATA, i.e. for backup and storage purposes
Once we had a clear projection for our server needs, it was time to choose our specs. One key consideration was the need for servers with KVM (keyboard, video and mouse) capabilities since we’d be accessing them from remote machines.
Availability of spare parts was also a key consideration. The most easily available model that could quickly scale to our requirements was the Dell PowerEdge r910 server, which comes with 16 backplane 2.5' disks, a powerful RAID controller and an internal battery. We also liked the Dell Compellent SC200 for storage, armed with the perc H800 RAID controller card, so we went all-in with Dell for our core hardware.
For GPU-intensive development/model training (think 'machine learning' type of tasks), we opted for stock hardware (single i7-7700 CPU family and ASUS gaming motherboards) with rack mounted cases large enough to host several Nvidia GeForce 1070-series video cards.
Now that we knew what equipment we wanted, we needed to find a reliable supplier. After much research online, looking at referrals, reliability, quality of the product, support services and the overall vibe, we made a supplier selection, while earmarking a secondary supplier as a fallback plan.
The day the servers arrived, we knew we’d made the right choice. All the internal components were in excellent condition, and we’d gotten our servers at a bargain price. Once our server farm was up and running, we’d soon be operating at a significantly lower cost than we’d previously been paying for our cloud servers.
Finding a server room
So, we had all of our equipment needs and our plan, but the question was, where would we put it all? We needed a location that was cool enough for our servers to be comfortable, secure enough to prevent break-ins, and also wired for VLAN access, which we needed both for security and for management.
As it turned out, the ideal location was right under our noses – literally! Once we’d run a couple of Cat-6 FTP cables to the basement of our building, bought some uninterruptible power supply units, a lot of pech cords and all other necessary accessories: we were ready to go.
Installation and Setup
Before we installed any of the servers, we brought them into our office, inspected them thoroughly for defects, and configured them to be ready to run as soon as we booted them up. Then we carried them down to the basement one by one; after all, IT people need exercise too!
Every server came with its own rail, so we began by mounting these, then pulled each server up onto its rack – which also turned out to be a more intense workout than we’d expected!
The appropriate pre-determined mounting sequence was designed to provide greater air flow, so that all servers could receive fresh air from the front panel.
Once completely assembled, our new servers booted up without a hitch. Now it was time for the serious test: running heavy-duty code on one of them... which it handled without breaking a sweat. All the hard work was paying off!
Finally, we created an interface between the batteries and our power monitoring system, to handle our power needs in the event of an extended power outage.
Since we were concerned about maintaining healthy working conditions, we also built from scratch and installed our own sensors to monitor the temperature and humidity in the rack environment.
In the end, we built our very first in-house data center in just three weeks. It took a good amount of time, planning and hard work to get everything built properly, and to our own exactign specifications. However, the results have been more than worth the effort!
Now, with all this high-powered hardware in place, we are better equipped to serve our clients than ever before, while ensuring the speed, security, and availability to meet the ever-growing demand on our data center.
Stay tuned for part 2 in the near future!
Electrical Lead at Mohawk Industries
5yMy H800 keeps complaining about invalid topology. I have two sc200 and one sc220. All with 2tb sas dell approved drives. Did you have to flash the h800?
Senior Monitoring Engineer at Carrefour
6yCiao Fabrizio.Complimenti per il bellissimo lavoro e bel articolo.
Stanford GSB Alum | Business Transformation | Technology, Operations, Strategy | Agile | Lean
6ySecurity. Speed. Availability. Cost. I will stay tuned for episode 2. Complimenti Fabrizio Fantini !
Making AI Work
6ySongtao HU, exactly. The cloud technology does exist to give quality security, fast speeds, large bandwidth and backup fail-safe systems to ensure the service doesn't go down, but it does come at a cost. As of now, to get the same results in the cloud, as we got from our in-house server, would be only found through a premium service, and at a premium price
Strategic Leader | Driving Digital Transformation & Business Growth in Telecom | MBA (Cambridge), MSc Driven by the transformative power of digital technologies.
6yI think the first 3 points are solvable on a cloud platform but the point on cost is the key there.