Experimentation in Data Science – Rajiv’s blog about data science

Note: This post updates an older write-up to reflect my current opinions.

The Importance of Experimentation

Recently, I built a tool for exploratory data analysis of time series data. I used co-pilot for coding the tool, I used gemini for mathematical research and undertanding new literature and research. I wanted to write this based on what I learned about the importance of experimentation in data science and how it is a critical part of the data science process.

Sure, co-pilot was great for writing code and iterating fast, but even for a simple application that is just three stages, there were so many iterations. So many times that the app was broken. Maybe Claude would have been a different experience, but I definitely did not have the experience that some people claim - write the idea broadly and presto you have a working app. I had to iterate a lot, and I had to experiment a lot. I had to try different things, see what worked, see what didn’t work, and then iterate again. This is the essence of experimentation in data science. You have an idea, you try it out, you see what happens, and then you iterate based on what you learn.

Here is the thing I realized though, you can have the best tool that implements what you think, but at least in the data science world, there are a lot of corner cases, these are things that you don’t necessarily see right away. Your tool reporting that it implemented what you wanted maybe true (let’s go with that for now), but what you missed may be critical. It reminds me of a bumper sticker I saw about 25-30 years ago that was mocking parents who had a child in a prestigious school and had bumper stickers about it. This mock as something like “Your kid may go to , but you are still an idiot”. The point is that real-world applications are complex with a lot of edge cases, and you need to experiment and iterate to get it right. None of us are clairvoyant, and if you are building an app for a data science task, I definitely don’t buy the idea that you write the requirements in a broad way and then the tool just gives you what you want.

Deployment was another area where I have heard people say, yeah AI will just replace all the dev-ops people. I had one bug in Arrow and it had to do with different versions of pandas havind different initializations for the default variables. I spent a good 2 hours on that. Every software engineer has war stories about some gnarly bug that they would never have imagined. So I am still sticking with my belief that AI is a great tool for data science, but at the end of the day, tools are as good as the guys making them.

Citation

BibTeX citation:

@online{sambasivan2026,
  author = {Sambasivan, Rajiv},
  title = {Experimentation in {Data} {Science}},
  date = {2026-04-13},
  url = {https://rajivsam.github.io/r2ds-blog/posts/experimentation_in_DS},
  langid = {en}
}

For attribution, please cite this work as:

Sambasivan, Rajiv. 2026. “Experimentation in Data Science.” April 13. https://rajivsam.github.io/r2ds-blog/posts/experimentation_in_DS.