Git Mountain

On learning curves, git and GitHub

On 19/Oct/2021 gave a presentation on starting with Git and Github and why you might be interested; click on the link above to download the slides.

I’ve learned a lot in the past years and become extremely aware of it. Maybe because in school it is so expected that you will learn new things, and tests and exams turn out to be more stressful than informative, we don’t tend to celebrate the difference between our current experience and understanding and the ones we had back in the day when we started whatever path we’re in. This time I also learned a lot of things that I didn’t even imagine I could learn: they were not ‘general fields,’ like baking and knitting, nor ‘my field,’ say linguistics. They had to do with programming and web-design: stuff that other people did and that seemed like magic to me. Going through different stages of the learning process let me learn a lot about how learning itself goes. Let me exemplify with git and Github .

The cliff of why

I find we learn best when it is personal. I could take all the introductory curses on R or Python or whatever that I could find, and theoretically learn what a function was, what a loop was, or how to open a file, but I didn’t really learn any of that until they solved a problem that I had. Until it didn’t come from my own interests to open a file, I didn’t see the point.

With git, it took me forever to figure out what to used it for. The first time I use it, it was under someone else’s instructions and I followed them automatically without understanding what they did and what they were necessary for.

Much later, I had to use Github Pages for a project, which required me to use git constantly. It was like a solid vertical cliff in front of me, intimidating in its string of incomprehensible commands, and the mist above my eyes made it impossible to see when that cliff would become a gentler slope. But I had to climb it.

My best friends were Github Desktop when I was working on Windows and a cheat sheet for the command line, which I was consulting all the time. I knew that in order for my code to work I had to push my changes to Github, and before that I had to commit them (writing a descriptive message), and before that I had to add them (I learned stage later), but had no idea why. Until I learned to run a server on my own laptop1, I didn’t know how to test my Javascript code and kept pushing every single tiny change to the repository to see if it even worked, which meant adding and committing and pushing maybe just to fix a comma that was ruining the whole code. Of course, with such frequent and often meaningless commits, the commit messages were useless, but who cared? It was only me using them, and I didn’t even know what for, I was just there for the website.

In any case, for that first period, I used git and GitHub for what I knew I needed it for, and I kept it as basic as possible. I had a website online and the code for it in both my Windows laptop and a Linux server. I added data from the server and changed the code in my laptop. When I was about to modify something, I updated the local version with the appropriate buttons in GitHub Desktop, or with git pull on the console (more or less —sometimes I forgot, which led to discrepancies and struggling and wanting to punch something, but that sure taught me to pull first). Then, after I made the modifications I wanted, I touched the active buttons on Github Desktop (add, commit, push), or in Linux I typed something like this:

mariana@server:~/repository$ git status # To see what there is to add
mariana@server:~/repository$ git add . # Adds everything
mariana@server:~/repository$ git commit -m "Added a comma" # Commits with a message
mariana@server:~/repository$ git push # Publish!

I did it enough times that it became a habit, even though I still didn’t understand what I was doing it for. But that is already something. I had found a small cave on the side of the cliff where I could light a fire, rest and make use of what little I knew, before climbing any further.

Landmarks on the way

Whether you can see what you need a certain skill for or not, facing a long path of unknown stuff to discover can be intimidating. Not only does the cliff look steep, it is misty and cold and it might get dark any time now. You have no idea how much you need to study before it becomes useful, or before you know enough to use the skill for your own needs. You don’t know how it fits with your current skills or with your future needs.

One trick I have for that is to set up small goals or deadlines. See what I can learn in a day or three, or get to “Hello World!” level and build from there.

The key is to be able to reach a concrete point where you clearly have more knowledge and experience than you had when you started, but close enough to the beginning that you still remember what it felt to be ignorant. Making a list of what you’ve learned, writing down a self-compliment or getting yourself some ice-cream are also interesting ways of reinforcing this 😄.

Once I learned how to run a server locally, I didn’t need to stage, commit and push every little change.I could wait until I had a significant modification that I new worked, stage it and commit it with a meaningful message for whoever wanted to see it. As I grew confident with this new skill, I created other Github Pages, other repositories, where I could save and share my work. I split a large miscellaneous repository in different ones for different projects; once I even collaborated with someone (kind of).

What I try to avoid is standing on that new landmark and just look up to the road ahead, whining at what I still have to learn, comparing myself to all those experienced, accomplished geniuses out there. Of course one should keep in mind that there is more to learn, be humble and respect those who have been struggling for years with the issues of a field you just now stepped on. But we all start from complete ignorance and build it up, with whatever time and tools we already have, and we all keep having a path ahead to keep exploring. I find that looking back makes me proud and gives me joy, gives me the energy to go forward, to trust in my own abilities and enjoy my achievements.

And the hill flattens

At some point I was confident enough with my little but solid “knowledge” of git and GitHub to try and expand it. More importantly, I read many articles/books that were not about git but recommended it, and spoke to my inspiring father —who is very convincing, although I resist—, which led me to take it more seriously and understand what I needed git for.

It is not that I didn’t need it before, I just didn’t know it.

You probably need git, you just don’t know it yet. How’s that?

How often do you undo changes on your files, when you’re writing or whatever? Now, after you have saved them, and closed your program, and then opened them again… can you still go back to the previous state?

Git allows you that, by keeping track of all your changes. It doesn’t just do it on one file: the idea is that you have repositories (think of a folder with all the files related to a given project) and git keeps track of any file you add, remove or modify. It is not automatic: you need to stage the changes, and then when you commit them, git takes a snapshot of your full project and stores the differences between the current snapshot and the previous snapshot. And it keeps record of all those snapshots so you can revert to any of them. You can even make separate branches, like alternative paths of your project, where you test different ideas, and then you can decide to merge them or not —and git takes care of it!

That is git, and it is managed locally. GitHub is an online platform2 that makes it easy to share repositories, so you can collaborate with other people on those projects, keeping track of who changes what, on what branches everybody is working, etc. You can also add a wiki on your project and manage issues: like a forum where you or anyone else can report problems and suggestions for the project.

Even without collaboration, it is useful as online backup of your material —you don’t even need to make it public. And if you use R Studio you can very easily integrate it with your R projects; I recommend reading this resource.

So git and GitHub are useful to manage your work —which also means, it can be useful to help you think about how to manage your work. Thanks to both RStudio projects and git repositories, I am more consistent in how I organize my files. I try to separate my projects and keep them relatively independent from each other; I try to back them up quite often, with more or less meaningful commit messages.

I still have a long way to go, but I reached a flatter area of the hill where I can see both the long way I walked and a large field of possibilities. It has touched my understanding of how I work and gave me confidence to explore more. I understand (some of) what git and GitHub can give me, and when I write the commands I have a better idea of what I’m doing.

A valley of possibilities

When I understood what I needed git and GitHub for, something clicked. I suddenly became more proficient and command lines that I didn’t know existed became an everyday thing, such as:

mariana@server:~/repository$ git submodule update --remote mysubmodule

It turns out you can have something called a submodule: a git repository that is called by another git repository as a subfolder.

In my case, I have data that I visualize with a web-based application but also analyze statistically and write reports about. Until a couple of months ago, the directories looked like this:

|-- visualization repository/
    |-- visualization code/
    |-- dataset/
|-- reports/

The code in the reports extracted the data from the dataset folder inside a different directory, which, after I started thinking of everything as separate, autonomous projects, did not seem like a good idea anymore. Luckily, I learned about submodules, which means that I have three repositories at the same level (one of which is private), but both the one with the visualization and the one with the reports have a copy of the dataset inside. I can have copies from different versions of the dataset, but normally what happens is that I change the dataset, and then run the command from above to update the copy in the other repositories.

|-- visualization repository/
    |-- visualization code/
    |-- copy of dataset (submodule)/
|-- reports/
    |-- copy of dataset (submodule)/
|-- dataset/

Locally, it looks redundant: I can see the copy of the files within each folder. But in practice, it makes it easier to keep track of the information and keeps the repository relatively autonomous, in the sense that the code does not look for files that are “outside” of the project folder3.

And that is where I am right now, which is a long way from repeatedly checking a Cheatsheet and blindingly typing the add-commit-push sequence. I started by climbing a cliff because I had to and stopping at the first shelter I found; I moved forward when I could, and at some point the slope became smoother: the more tools I had and, more importantly, the better I understood what the tools were for, the wider my field of options. The path is not so unclear anymore, nor is it steep: it is a fruitful valley to explore indefinitely.

References

Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

  1. I first downloaded XAMPP for Apache, thanks to my cousin Alex’s suggestion, and after almost two years my brother Nicolás suggested python -m http.server, for which I am extremely grateful.↩︎

  2. For now, I use GitHub, but there are also other platforms, like GitLab and BitBucket.↩︎

  3. Something that also helps here in R is the amazing here package (Müller 2020), to always use relative paths within your current project.↩︎

Mariana Montes
Mariana Montes
Doctor in Linguistics

My research interests include cognitive and corpus semantics and visual analytics.

comments powered by Disqus

Related