Analysis Engineering

I’ve been working on a post about how I built my DS/ML blog, but I’m a perfectionist (and I’m new to this) so it’s taking me some time to write a post I’m happy with. That said, I do want to build a habit of posting regularly, so I decided to do a “shorter” piece on something I think about a lot … my job title.

Simple Questions and Unclear Answers

In a city like New York, I’m often asked what I do for a living. On the surface, the answer is pretty simple - I work as a policy research analyst at a nonprofit research do-tank. After spending years telling people that simple answer, watching a look of befuddlement creep onto their face, and then trying to explain my job title before their eyes fully glaze over, I’ve come up with a simpler (but more involved) description - I’m basically a lab assistant for real-world, quasi-experiments.

Like a lab assistant in a hard-science lab, I implement parts of broader research agendas set by more senior researchers (usually professors and/or PHDs). I then report results in a way that best informs the broader research question. However, because I work for researchers looking into issues where traditional experiments are impossible and/or unethical, rather than using beakers and pipettes to run experiments and generate data that I then analyze, I instead analyze observational data¹ using scientific-computing languages.

My simpler explanation leverages that lab-assistant analogy to help people understand my day job, but whenever I deploy it I’m reminded that my job description doesn’t do me any favors. In particular, I spend so much time trying to help people understand what I do that I never get to the point of describing the unique challenges that have shaped how I approach my role. So I’m going to try to do a bit of that below.

Naive Assumptions and Unexpected Problems

I originally came in thinking that I would do exactly what I had done for my MSc thesis - write code to analyze data, rinse, repeat, the end 😂. It didn’t take long for me to realize how naive this was. But that realization didn’t solely (or even, primarily) come through someone “showing” me the right way to do things. Instead, it came about because the nature of my work was different enough from the kinds of individual one-off analyses I had done in the past (and was very familiar with) that I quickly ran into problems that my existing practices, techniques, and tools couldn’t solve.

These problems (problems of collaboration, maintainability, scalability, reproducibility, replicability, and efficiency) often kept me awake late at night wondering if the work I had done was good enough. Crucially, I almost never had the same concerns about my analysis. Once I figured out how to implement a given research agenda, the analysis (the bit I’d been trained to do) was often trivially easy. It was everything else - things that more senior researchers didn’t really think about, hadn’t been trained to think about, and I was quickly realizing I was being paid (or maybe underpaid?! 🤔) to think about - that worried me.

So how did I solve these problems? Naturally, I did what I was told to do when I first began programming - I googled, I read, and I borrowed very very liberally from other people’s code. Over time, as I began to find solutions to these problems², the way I approached my work fundamentally changed. I stopped thinking about analysis as a one-off exploration that just happens to be done with code and started thinking about results and code as products in and of themselves.

Software engineering by another name

I began to realize that my actual job was creating data products (analytical engines calibrated to answer specific question)³ for more senior researchers (and other stakeholders) using code. The more I approached my work this way, the more I realized that software engineers had encountered and solved (or mitigated) many of the problems I thought were novel, the more I adopted best practices from software engineering, and the more I began to realize that I was actually doing was … software engineering. Put differently, I saw that I used the same tools as engineers, and found that adopting similar practices as engineers solved the problems I was encountering, so there was a high chance that my job had been horribly miscast. I was called a research analyst, but what I had been doing this whole time was a version of software engineering.⁴

In a word, I realized that my work is actually analysis engineering (applying statistical understanding AND the engineering design process to design, develop, test, maintain, and evaluate analysis code/software⁵) and approaching it as such has made my work life much much easier. Unsurprisingly, this isn’t an original idea. In fact, several other people have been thinking and writing about this same idea for a while. For instance, the analysis-engineering approach is a core building block of Emily Riederer’s RMarkdown Driven Development workflow. Similarly, in a fantastic PeerJ preprint that is absolutely worth a full read⁶, Hilary Parker argues that, given its complexity and likelihood of error, modern analysis engineering has come to a point where encouraging (and teaching) the use of methods and tooling that push users to adopt recognized best-practices (particularly those that guard against error during the technical creation of an analysis) is not just possible but actually necessary so that analysts can more easily and naturally create work that is reproducible, accurate, and collaborative. By so doing, analysts will be freed up to focus on (and be more productive in) the actual analysis rather than focusing on not making common, avoidable, time-consuming mistakes.

That’s why I think my job title is misleading and why I say I actually do analysis engineering 🤷🏾‍♂️.

Dedicated to craft,

Dami

Footnotes

data that comes from real-world processes rather than the controlled circumstances of an experiment↩︎
I’m definitely considering making future posts that talk about these solutions - look for them under the “analysis-engineering” tag↩︎
Emily Reidererused and defined this term in a way that firmly lodged it in my conscious unconscious↩︎
Don’t get me started on how, to the detriment of younger researchers and research as a whole, a lot of senior researchers still don’t see this. Truly, don’t get me started - in an earlier draft I had a whole paragraph that doubled as a very critical, very “clever” analogy, but my editor (read wife) convinced me that it was a little too on the nose 🤷🏾‍♂️.↩︎
Borrowed and modified very liberally from Google’s definition of software engineering↩︎
It was so good that I’ll likely continue referencing it in the analysis-engineering series.↩︎

Citation

BibTeX citation:

@online{aboaba2024,
  author = {Aboaba, Damilare},
  title = {Analysis {Engineering}},
  date = {2024-02-06},
  url = {https://DAboaba.github.io/shokunin//posts/analysis_engineering},
  langid = {en}
}

For attribution, please cite this work as:

Aboaba, Damilare. 2024. “Analysis Engineering.” February 6, 2024. https://DAboaba.github.io/shokunin//posts/analysis_engineering.