Data Infrastructure

Shanif Dhanani

Founder of Nobi — AI shopping assistants that boosts conversion rates by improving discovery & recommendations

Published May 14, 2018

The technology world has come a long way in a short time. What may have been considered best practice a few years ago is now considered out of date and old fashioned. I still remember when “SOA” was a thing and OLAP cubes were how people handled big data analytics. Though some parts of the world may still use those terms and technologies, Silicon Valley has zoomed way past.

What hasn’t changed, though, is the importance of accessing and retrieving data efficiently. If anything, the importance of getting access to one’s data has grown immeasurably with the emergence of large scale machine learning, and that statement holds true for both large and small companies.

At Apteo, we spent a lot of time on our Phase 1 R&D. It was heavily geared towards utilizing the best ML methods available to us while finding and integrating new data sources that we found to be useful for our purposes. We came out of this effort with what we branded as our “V1” platform — a set of deep networks, analytical techniques, datasets, and a strategy for putting them all together into a smart index investment product.

V1 was a lot of fun. Not only did we learn a lot, but we also worked on some really fascinating data science problems, all while coming together as a small but productive team.

We’re now getting close to what we’re calling our “V2” platform. In contrast to what we did in V1, this version has been all about infrastructure. We’ve spent the past couple of months working on solidifying our prediction mechanisms, creating more robust tools for generating our ML models robustly and repeatably, implementing best practices when it comes to both technology and data science, squashing bugs, paying off tech debt, and when we had a chance, we built out a user-facing dashboard.

Now why would we actually spend all this time on backend engineering tasks when we have an entire world of data science and user-facing product to build?

Because without it, we would be far too unproductive and slow in the future.

The fact that we had to face is that data infrastructure enables everything else in our world, and my guess is, in the world of nearly every machine learning company out there.

Before we solidified our infrastructure, our jobs would error out every so often, we wouldn’t reliably get the predictions we needed to move our investment strategy forward, we had very little insight into the status of our scheduled apps, and we would spend far too much time in devops mode.

All of this took time away from what we really should be spending our time on, which is research, development, and analysis of better investment models.

We’re small, but this issue impacts everyone. Google and Uber have both produced posts on the massive infrastructure they built so that their ML engineers and data scientists could easily create and deploy models. The same thing happened at my former company.

Allowing data scientists quick and efficient access to their data is a luxury that many data scientists don’t get in their jobs. As we grow, we’ll undoubtedly have to recreate our platform from scratch at least once, if not more often, to accommodate our needs. For now, I’m happy to be nearing the end of a rather long engineering effort that will hopefully pay dividends for the foreseeable future.

To view or add a comment, sign in

Data Infrastructure

Shanif Dhanani

Founder of Nobi — AI shopping assistants that boosts conversion rates by improving discovery & recommendations

More articles by Shanif Dhanani

Insights from the community

Others also viewed

Vector Technologies for AI: Extending and Enhancing Your Existing Data Stack

Optimal Data Science (product)

The Industrialisation and Professionalisation of Data Science: 12 Questions

It’s the data engineering, stupid - bringing clarity to complexity

Understanding Feature Stores and Semantic Layers

Databricks

I Don’t Believe in Data Science Platforms. I Just Believe in Me.

Five Skills for Data Science Success in Financial Services

Breaking Open the Black Box

Data Engineering in 2024: A Year of Bold Moves and Bright Ideas

Explore topics

More articles by Shanif Dhanani

The Agents Newsletter #10: Creating Nobi’s Agent

The Agents Newsletter #9 - Why Multi-hop Reasoning Is Harder To Get Right

The Agents Newsletter #8: Guardrails

The Agents Newsletter #7: Context Windows And “Memory”

The Agents Newsletter #6: Reinforcement Learning Agents (The OGs)

The Agents Newsletter #5: Agents Vs. Traditional Machine Learning (Where Agents Fall Short)

The Agents Newsletter #4: Horizontal Vs. Vertical Agents

The Agents Newsletter #3: Agents vs. Workflow Builders

#2: How Agents Work

#1: About Agents