How to Clean a Git Branch?
Software production is rarely fully controled. We usually produce code that we rewrite later, figuring out iteratively what works and what doesn't. As a result, Git commits can be quite messy if we create them chronologically. A single change can be distributed over several commits, and a single commit can contain several changes. At the end, we often reach a state where the commits history doesn't help us understanding what was done.
This issue is very common: it is often the difference between a beginner and an experienced versionning user. Others have shared their own perspectives on this topic:
- https://ensiwiki.ensimag.fr/index.php?title=Maintenir_un_historique_propre_avec_Git (FR)
We won't dive deep in the reasons why you may be interested in cleaning your branch. We can summarize with generalities: just choose between investing time doing the right thing now, or save time now at the price of a higher cost later. As a non-professional, think about your learning priorities. As a professional, think about the consequences on:
- yourself: it enforces you to make your own code review, ensuring that what you push is actually what you intend to.
- the others: it structures your changes and gives an additional information about the intention you had when producing this code, which helps the reviewers, the future colleagues who have to revert your changes (don't take it personally), and so on.
If you are still here, you think that cleaning your Git branch makes some sense. So let's dig in.
A Git branch comes with a history of commits. Moreover, each commit comes with a content (the committed code) and a message (the text of the commit). Cleaning a branch requires to take care of each of these aspects. In other words, we want to know here:
- How to clean a commits history?
- How to clean a commit content?
- How to clean a commit message?
All three points are related, in the sense that they all participate in the "story" told by the commits history. We will address each of them separately.
The commit message is the first thing we see when looking at a commit. However, it is not always fully exploited. This comes with a reason: we do not always provide useful information there. Did you never read commit messages like "Correction", "Fix a problem", "Add a new function", or "Remove old stuff"? Well, surely the commit provides some changes, and add or remove something, hopefully for the best. But we can guess this far already, so what is this commit for?
Even with more specific details, a recurrent issue is that the message focuses too much on what the commit does. Like "Add an uppercase option to the formatting method", or "Create a JSON formatter in the formatter package". But we can guess that much again by looking at the content of the commit. We can see that a parameter named "upperCase" has been added to a function. We can see that a JsonFormatter class has been added with the formatter package. We can see that a piece of code have been removed. But telling what has been done does not tell why we needed that in the first place. It might seem obvious when the change is done, otherwise we wouldn't do it in the first place. But after a few months, or after you have moved to another mission and your successor come back to your change, how does it help?
When you write a commit message, think about a complete stranger who will have to maintain your code. This person needs to know what was the purpose of the change, usually what was going wrong before changing it. For the information to be complete, one needs to know:
- what was missing/wrong/obsolete before the change,
- what has been done to improve it,
- what we have reached with that change.
The point 2 is addressed with the content of the commit, not its message. You can think about it like the difference between the code of a function and a commentary that tells what this function does. Surely, if the function does something, it is first of all based on its code, not based on what the commentary says. While a commentary can be complementary, the code remains the one deciding about what is actually done. Similarly, you may provide additional information in the commit message, but rely first on the commit content to tell what has been done. In other words, make the changes themselves clear enough for a future reader.
The point 1 is the most important to address in the commit message. It motivates the change and help the future reader to evaluate it on the spot. Think about it: if this change was done because of issue A, and a recent update offers a better solution to issue A, it might be relevant to come back to this change. At the opposite, if this change was for issue B, then the new solution should not be applied before we ensure that we do not regress on issue B. Saying that the change was for issue A or issue B changes the interpretation of the changes without having to dig in the code and speculate.
Finally, once we know what needed to be changed (message) and how we changed it (content), then we can assume that the need has been fulfilled. If not, or not entirely, or if particular points of attention need to be highlighted, then the commit message can be enriched to tackle point 3. It will participate in setting up the context for future changes that may do better. Some people might consider that this information should not be in the commit, but in the code. Indeed, documenting the code is important and we read more often the documentation than the commit history. But others will also say that this information is in the ticket associated to the change. All these people, as well as those thinking it has its place in the commit, are right: it depends on the context. At the end, the point is that the information should be easily available from where we look at. Write the information where it is the most relevant, and link to it elsewhere.
As a reminder, a commit message is composed of a title (first line) and optional details (remaining lines). Ensure that the title provides the minimal information to understand the purpose of the commit at a glance. Use the details as you see fit to ensure that a reader can access all the additional information easily.
The content of the commit tells what has changed. In order to understand that part, you should first understand the purpose of the commit. If you followed until here, this is the responsibility of the commit message. But you can properly redact this message only if the commit content fits the bill.
The main objective when creating a clean commit is atomicity:
- all the changes addressing a purpose should be in the same commit
- all the changes of a commit should address the same purpose
Indeed, if you spread the changes among several commits, it becomes harder to understand a commit without knowing the others. Similarly, if you mix several purposes into a single commit, it becomes harder to understand which change relates to which purpose. By making a one-to-one relation between a purpose and a commit, you focus the attention on a single purpose at a time while ensuring it contains all the relevant information. Here are some examples of changes that are prone to relate to the same purpose and be found in the same commit:
- a fix and a (set of) test(s) showing it works
- a new feature (e.g. a new Java class) with its tests and documentation, and the code integrating this feature in the application
- the removal of an obsolete feature and the removal/replacement of all its calls in the application
- the refactoring of a method and all the adapted calls to this method that make the application still works
You don't need to wait having all the relevant changes before to create a commit. While developing, it is often a good practie to commit your work in progress and push it on some remote to avoid losing it. But once you have finished, you can go through your history of commits to clean it. This is when you have to care about your commits atomicity. And the various ways to achieve that are explained in the next section.
The commits history tells, to some extents, how the changes fit together. You can tell a "nice story" by organising your commits in a way that facilitates the understanding of the evolution you made. Several things can be done for that, and each are detailed in specific articles:
- reorder the commits
- split a commit into several ones
- squash several commits into a single one
- delete a commit
- fake a commit (create and revert a commit to add changes without impacting the final result)
Feel free to combine them as you see fit, depending on your needs.
Don't forget however that you are rewriting the history!
If it is only your local stuff, that is OK.
If you rewrite stuff already shared, be sure to get the go from your colleagues.
Once done, you will have to push your changes with
git push --force-with-lease (or
git push --force if your Git is old).
If you feel unsafe touching the commits history, take the relevant safety measures first.
Once you master these operations, you can combine them to do more complex things. Here are some examples:
- To add a change in an existing commit, add your change in a new commit, move it close to the other commit and squash it
- To merge some commits in the branch you are based on, move them at the beginning of your branch before to merge them in fast-forward.
- If you need a peer review before to merge them, create a branch on the last commit to merge and share it. If some fixes occur during the review, ensures that you create new commits for them. You can then fake the fixes at the beginnning of your own branch and delete the reverts to reach a compatible state. The commits and their fixes can be cleaned before to be merged, then just reproduce your own commits (not merged) on the updated base branch.
These procedures can be combined and greatly simplified with
git rebase -i.
Mastering rebasing and the rebase command can save a lot of time to rewrite your history.
For the ones who still struggle with Git, I strongly recommend to go iteratively.
Clarify the first issue you want to tackle and select the procedure to apply.
Check that the new state of your branch corresponds to what you expect, then pass to the next issue.
If the branch is so messed up that you don't even know where to start from, prefer to squash all your commits into a single one. You lose all the history, but you only have the final state in a single commit. At that point, you can split the commit as you see fit and reorder the extracted commits to get something easier to understand. A systematic way to go is to:
- split the overall commit into multiple atomic commits
- move the commits to group the ones that should be kept together
- squash each group into one commit each
- order the remaining commits to facilitate the understanding
You may think that all these commits moving around means a lot of conflicts to resolve. However, if you properly design and organise your code, you can use independent pieces thate won't conflict with each other when moved. If you happen to have a lot of conflcits, you may think again about what you are doing.
- Git Book: https://git-scm.com/book/en/v2
- Git Book - Rewriting History: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History