moving the lamppost

random musings of a molecular biologist turned code jockey in the era of big data and open science.

R: it thinks it’s cooler than me; it’s not wrong

I do a lot of RNA-seq analysis for my research and we mostly run the popular “Tuxedo” suite of analysis programs (bowtie, tophat, cufflinks, and now the REALLY COOL cummeRbund). So cummeRbund, as the capitalization suggests, is written in the R programming language (meant mostly for doing statistics really well). So I have been working in R a bit more than usual lately and I realized something. But before I get into that, you should know something:

R and I have a *love:hate* relationship...

It really does have almost mind-blowing capabilities, and it has been thoroughly embraced by at least the gene expression and related wings of the bioinformatics community. This has resulted in the conglomerate family of R libraries housed under the umbrella of the bioconductor project. It’s data display capabilities are spectacular; to which anyone who has seen plots generated by the ggplot2 library can attest. Take a look.. Oh and did I mention does all of it’s world-class awesomeness completely pro bono? Suffice it to say that, R is plain awesome and this once niche language initially popular with academics who couldn’t or simply wouldn’t spring for expensive licenses for more mainstream stats software is actually starting to take the rest of the data-analysis world by storm [link1, link2].

So that was some of the love side of the relationship.

What I said I came to realize, at this beginning of the post, deals more with the other side of the relationship. Basically, R is like that athletic, pretty, popular kid at your high school. I know that sounds like an extraordinary claim, but bear with me. R, like that kid, knows that what it has is pretty darn awesome. And like that kid, it has grown up with people bending over backwards to ask it to the dance no matter how well or poorly it treated its suitors. But I gotta tell ya, as a Python-guy, it’s almost offensive to me to work in R. I do it. And don’t get me wrong, I am glad it is there. But it does not let you forget that you need it, not the other way around.

Now I must admit that if I started with R, instead of Perl and Python (hell, or even BASH), I might feel the opposite, but I am not sure about that. It also might be that I am just spoiled by working in Python most of my time since that language bends over backwards for me. But Holy Hell, it seems like R is the classic example of design by committee!

Here are some examples of things that drive me crazy. Please pipe in if you object to or can provide rebuttals that might change my mind about them.

Things that drive me crazy about R

  1. Object methods seem to live in the global namespace.
  2. Documentation is simply hit or miss at best and downright less-than-useless in many cases (and its not just the authors’ fault)
  3. Slicing arrays is overly complicated
  4. The convention of naming variables with a ‘.’ instead of an ‘_’
  5. Assignment operator is twice as many characters as it should be

Each point will probably end up getting its own short post.

Object methods seem to live in the global namespace.

This one might be me not being familiar enough with how R treats scope or namespace, but it sure seems that if I want to call an object method (objMethod) for an R object (rObj), I call objMethod like a global function:

objMethod(rObj)

That sure looks like it lives in the global namespace rather than in its object’s namespace. Here is how this would be done in Python for comparison:

pyObj.objMethod()

You specify the local namespace (pyObj) then call the method that lives in that namespace. Its an object so it knows that it is operating on itself. That’s kinda the whole idea of objects.

Now this might sound like a trivial difference, and to some extent, it is. But it is also kinda dumb in my opinion. For one thing, it needlessly creates confusion when trying to learn a code library. There is nothing in the code that tells me that I can not call objectMethod on non-rObj variables: that it belongs to only objects of the same type as rObj.

In python this is unmistakable.

This is actually a major gripe of mine about R; it really doesn’t seem to want you to have an easy time learning new libraries. However, there is another problem. Because the objectMethod lives in global namespace, it can be overwritten if you load another library that has an object with a method of the same name. This is exactly why object methods live in the local namespace of their objects in most languages.

Anyway, if I am wrong about this, please set me straight. If nothing else, I will learn something new about R that will let me get better at it.