Notes on Notes on Notebooks

We were given a simple "preparation" task for our next "programming language design" class. This paper (despite me being an "inexperienced" reader, programmer, and academic) made my guts twist. What follows is just a quick summary of my ruthless critique and perhaps a reminder for my future self.

Intro

The paper is about notebooks (think Jupyter), what they are, what issues they have, and what the status of the ecosystem is. I've used Jupyter and R notebooks to a small extent, and my perspective here comes mainly from my software developer side. There may be some things I've missed due to a logic error (either by omission or stupidity of my brain) or a comprehension issue (although I think my language skills are sufficient)

The content

You can read the paper yourself! It's really short! Reading it on your own would come with the benefit of also forming an opinion on it on your own (which could help you with critiquing my critique ;) )

Now please enjoy my messy list of all sorts of weird things found in this paper:

(implying working with notebooks is) "end-user software engineering"

Damn are we really giving notebooks the notion of "software engineering"? This is just an awkward stretch, isn't it? Later on, the author himself contradicts the "engineering" part of the statement...

"a trivial crawl of GitHub shows around 10 million notebooks have been checked in"

Trivial as in... without forks? What exactly is the methodology here? Anyway... the main point here is: 10 million is not a big deal. Especially if you consider that later in the paper, the author mentions how only 4% give reproducible results and only 25% contain an explanation! Also... since we are calling this "software engineering" (and since the phrase implies some level of generality), did the author even bother checking the contents/categorizing the notebooks? I think that way, the author would discover interesting distributions in various fields ;) (but hey, I don't have the data, don't trust me). It disturbs me that in the next sentence, the author concludes that a "large number of people are engaged in notebook programming"... What is large? What is the author's estimate? Does the author even have the numbers? (sure I am nitpicking but those are very bold statements and imo calling them out makes the ridiculous nature of such phrases stand out). And oh boy! In another sentence, author states those people in fact have no idea what SW engineering is! What a weird world. Maybe... just maybe we will conclude that it might not be SWENG after all!

"Source code preparation systems"

HUH? Am I... late to the party? o_O

"Notebooks reside in a browser-based coding environment with no complex installation steps"

Wait what?! Every single system I worked with had issues both on Windows and Linux. What is this guy talking about? (later we reveal that this, in fact, probably just refers to the client-server architecture, and that browser - the client - satisfies the description - which I agree with)

An interactive notebook provides... [rest of the paragraph]

Here I would only correlate the paragraph with the SWENG statement. When I think of SW engineering, I think of complex systems. There is no way to have "immediately visible" output "live code and interactive interpretation" in most (even semi) complex systems. I find the author's mixing of the meaning of the term SW engineering into all this mess weird at best and malicious at worst.

"Notebooks are universally popular."

No. Plain no. This sentence itself discredits any effort made in this paper. And I should have stopped taking it so seriously at this point. BTW, the universality here is backed up by the "exponential" growth of the number of notebooks and the current number of notebooks at 10M on GitHub. facepalm

Learner testimony ... ("two people said this")

I chuckled at that one. How is this even published under a real name? They even admit their "course notebooks were short and simple"...

"Notebook programming appropriately fits contemporary coding practice."

Let's stop at this sentence. Let's stop and try to think about what "notebook programming" is and let's try imagining how the world would look like if it were "contemporary coding practice". Do you want to be a software developer in such a world? Damn. This shit hits like a truck. Moving on to deepen our existential crisis:

"Programmers hunt for useful code snippets via a highly specialized online search. This 'stack overflow mentality' which leads to 'cut-n-paste' coding is increasingly common."

Okay, 3 things here:

highly specialized online search? as in... a search engine with "exact matching"? To me, this sounds like a wordplay to immerse the reader and I find the idea itself ridiculous. But okay, let's... accept this view.
SW engineering and "cut-n-paste" coding tied together really makes me sick. Again, do we... Want to live in a world like this? Work with people like this? Do we want to think about SW engineering as some sort of gluing of junk that we barely understand and whose outcome we are unable to flexibly improve or extend? I cannot accept this. Maybe I am delusional but this worldview is unacceptable. One could argue I am only subjectively reacting to the binding the author created between these two worlds. And I am. For me though, the core of the issue lies in the author's binding between software engineering and the cut-n-paste style of notebooks that comes from what seems like the author's objective interpretation of both the notebook world but also and more importantly the software engineering world. And that view is objectively wrong.
"is increasingly common" - personal jab here: "It's common and I accept that we will just have shitty software :)" - popular does not mean correct (esp. WRT SW engineering)

"...who work in scientific research are the original intended audience for Jupyter. For such scientists, their code is not the primary output

There! There! Right there! At this point, one should just remove the popularity claims and the ties towards SW Engineering. Mind you, the emphasis on the code not being the primary output is probably a direct acknowledgment of the "shit code" SW Engineering flaw I keep pointing out. I'll leave it at that.

Notebook "modularity" idea

I just find this unusable and wacky. Especially when we consider (again) that only a small portion of notebooks are reproducible or contain any explanations... (maybe that would change with modularity though). Personally, I would call this a "citation" instead of a "module" system. Again though... the author makes mentions of growing "notebooks to handle large-scale software engineering projects". That sounds... deeply unergonomic for me and my notion of "using a notebook". What do you mean by "large software project"!? The lack of definitions and clarity is starting to hurt my head. From references, it seems the author pretty much bends the idea into "What if notebooks could do the same as languages!" - which is not what the reference material says...

"inherit" ing from a notebook

Please. Just. Don't. Shove. Oop. Everywhere. Maybe? cyring out blood emoji

Versioning

Two words - glorified diff. Maybe I am missing something but that is about the only thing that the versioning would require. Also... HTML fallback is just terrible in Rmd...

Introspection

Based on... well escaping the browser sandbox in order to "do stuff". This seems like a waste of words since the author purposefully limits himself to the browser for ease of installation and cross-platform support yet wants to violate the basic rules that the sandbox gives him. Then again... why not write an interpreter of the notebooks in JavaScript and run that in a browser? I don't see why this is an issue - people obviously don't need/want this and especially nobody is asking for a "meta notebook protocol". Duh.

"they should be able to accomplish elegant, rich, reflective programming in their source code language of choice"

C/C++ - allow us to introduce ourselves...

I burst out laughing after reading this sentence. I too want to fly in this imaginary world that nobody needs and only probably one person wants. Pardon my bluntness here but... am I stupid or is this just... a list of ideas with miniscule effort to help/analyze the IMO interesting ecosystem?

Conclusion... "We... have pointed out promising solutions, some of which have commenced in development already"

[CITATION NEEDED] [CITATION NEEDED] [CITATION NEEDED]

Also: what did the author point out? They mention a few projects but anything other than that is... vague. Remember that "meta notebook protocol". All I can say is that I too can put good-sounding words together and publish an article that can be condensed to half a page... Except I fear the embarrassment - maybe I should not. LOL.

And after all this, do you really think "this could be a rewarding area for future investigation"? Really?! Rewarding... for whom? face of disbelief

Good/Interesting ideas

Read-only cells and a reset button to reset the notebook environment were the things I missed while using some notebook systems (or was looking for them and found them).

Regarding modularity via ipynb and import hooks - can this be/is it automated? This would be a relatively cheap solution to make it at least more usable.

Parallelism/concurrency - most of the notebooks are Python anyway. For education purposes, I could see a point but I really don't know if anyone would... benefit from better tooling for C/C++ notebooks...

Epilogue

After digging through the resources, I found this presentation from Jupytercon 2018 that I think greatly overlaps with my ideas (it is a superset, it's really good compared to this rant of mine). The presentation introduces a whole bunch of new issues with notebooks that render them unusable for more complex stuff other than reporting. I also saved it to a cloud storage (version with working GIFs) so that it has a better chance of surviving.

Damn, I had a blast writing this one. It is messy and not really checked stylisticaly but it is what it is. I really liked the idea of Xe Iaso's characters used in articles to introduce ideas via dialogue/questions. I think that this post would benefit from such features.

Last update

2nd October 2024