The Final Report

The Final Report

This blog will describe my GSoC project along with a small account of how I started contributing to Git. My GSoC '20 project was to convert 'submodule' to a builtin by porting it from shell to C. Initially, Git commands were written in shell with some instances of Perl as well. As times progressed, various platforms to run Git emerged & projects became large (spanning millions of lines of code), enter, problems in production level code, such as:

  • Difficulties in portability of code. The submodule shell script uses commands such as echo, grep, cd, test and printf to name a few. When switching to non-POSIX compliant systems, one will have to use emulation layers to implement such commands on the system. Which is a lot of extra work.
  • There is large overhead involved in calling the command. As these commands implemented in shell script are not buitlins, they tend to call multiple fork() and exec() syscalls for creating more child processes hence creating another shell.

to name a few. Hence, porting them to C became necessary. Many commands such as git add, git push, etc., have already gone through the porting process. git submodule was the one which still needed some effort. Hence, I chose this as my project.

Pre-GSoC period

I started contributing to Git quite early, at around the end of January. After taking some advice from my senior (another GSoC student at Git in 2019), I decided to start my journey by picking a microproject from a list of those provided by Git itself. I decided to work on the test file t6025-merge-symlinks.sh (now renamed to t6405-merge-symlinks.sh) by doing the following:

  • Modernise the test file by amending the indentation in the file, correcting the occurences of tabs and spaces, etc.
  • Use helper functions such as test_path_is_file() instead of test -f since it makes the code more readable and gives better error messages.

Later on, I introduced myself to the Git community and was helped by Christian (now my mentor) and started gathering some ideas for my GSoC project. Johannes, Jakub and Derrick advised me on the path of this project. I came to the conclusion that if this project needs to be completed on time, I will have to start a bit early, maybe even before GSoC starts.

I did face some difficulties to begin with and had trouble in understanding the Git code. I started to do some other tasks to help the Git project instead of helping with the code. For instance:

  • I tried to solve the doubts people posted on Git.
  • I tried to comment on others’ work so that I could figure out how the work is to be done by emulating other more experienced contributors.
  • Putting up small patches like these.
  • Helping other GSoC aspirants and commenting on their proposals.
  • I started contributing to the gitFAQ introduced by Brian so as to increase my knowledge about Git and its commands.

With time, my confidence increased and I entered a condition of flow. Hence, I started to work on my proposal and made it more detailed and nuanced with every iteration.

After the proposal submission deadline was over, I started working on porting the first and a relatively easy subcommand set-url from shell to C and somehow delivered a working patch just a day before the results were announced.

To my delight, both the patch and the work I put in for GSoC were successful and I was selected to be a part of the program for Git! My mentors were Christian Couder and Kaartic Sivaraam.

GSoC

The Proposal

I had proposed the following tasks as a part of my GSoC project:

Convert the Git command 'submodule' to a builtin:
 - Port submodule subcommand `set-url` from shell to C
 - Port submodule subcommand `set-branch` from shell to
   C
 - Port submodule subcommand `summary` from shell to C
 - Port submodule subcommand `add` from shell to C
 - Improve the parsing of the command and try to do
   the parsing in the C code instead of shell.

Out of these, the first three are complete while the fourth one is in review. The last one has not been implemented yet.

Phase 1

The first two commands mentioned above, set-url and set-branch were ported from shell to C. set-url helps to change the URL of the submodule specified to the one we desire and synchronizes the new remote URL configuration. set-branch helps to set the default tracking branch of the submodule. The tricky part in this implementation was to implement the --default and --branch options.

Phase 2

I spent my phase 2 trying to implement the summary subcommand. The subcommand shows the commit summary between the given commit (defaults to HEAD) and working tree/index. This is the kind of output we get from the subcommand:

$ git submodule summary 7c1532cab3125ae24f14ee9433b3c673f2964ef1
* bash 4f90705...f10c25c (4):
  > Git: add config for hiding prefix shown in diff
  > tools: add cron script to automatically build,install git(-next)
  > tools: add script to check conflicts in sync(thing) folders
  > Git: use the experimental commit graph feature

My mentors advised me to use the work already done on the subcommand by 2017’s GSoC student Prathamesh Chavan. His work had to be made compatible with recent Git standards and change various functions. The work was very detailed and helped us discover some problems in the test script t7401-submodule-summary.sh. Hence, the work seeped into some of Phase 3 as well.

Phase 3

The aforementioned script used git add to add submodules instead of git submodule add due to which some commands such as git submodule init and git submodule deinit failed to execute correctly. I decided to create a new test script called t7421-submodule-summary-add.sh, which uses git submodule add to add submodules. We decided to cover some niche test cases which weren’t covered by t7401 so as to not make this script redundant. It felt really great to write a test script of my own! Since fixing t7401 would be a long task, it was decided that it will be better to add some notes about the difference between this script and t7421 and save the major work for later.

Later on, I also started work on porting git submodule add to C. This was also done by Prathamesh before, but again, required a great amount of refactoring and corrections. The work is currently under review from the community.

The summary port also received some feedback from Peff, Dscho and Kaartic:

- Peff stated that the unused parameters
  'missing_src' and 'missing_dst' in the function
  'print_submodule_summary() can be removed.

- Dscho stated that the test script t7421 failed on
  Windows, in particular t7421.4. This was dure to the
  fact that the test checked for a particular error
  message by grep-ing it. The message is actually
  different for *nix systems and Windows, hence, the
  failure of the test.

- Kaartic stated that the function
  'verify_submodule_committish()' had wrong placement
  of the asterisk in a function parameter. The asterisk,
  instead of sticking to the variable name, was stuck to
  the data type, i.e., it was 'char* param' instead of
  'char *param'. Hence, this was to be fixed as well.

Therefore, I delivered a fixup patch series later on. The series will merge to master soon.

Organisation of work

I kept a track of the work I did during GSoC via two means:

  • To keep track of the code I write, I forked git/git, which now contains all the work I have done till date.
Work Notes
set-url
Due to an error on my part, I lost the branch
containing work on set-url. Commit in git/git.
set-branch
The work on set-branch is in the branch set-branch.
Commit in git/git.
summary
The work on summary is covered in branches named summary-.* with
summary-v3-final
containing the commits which got merged into git/git. Commit in git/git.
summary-fixup
The fixup to summary-v3-final (i.e., ss/submodule-summary-in-c
on git/git) is in summary-v3-fixup. Commit in git/git
Work on
t7401
The work on t7401 is in the branches summary-v2-t7401.*
and summary-t7401.*, with summary-t7401-v3.3 containing the commits which got merged into git/git. Commit in git/git.
add
Similarly, the branches subm-add-.* contain the work I have done
on add. The branch subm-add-v1 contains the commits which were sent
to the List.
  • I also wrote blogs since the start of GSoC, with a blog written for almost every week of GSoC. The blogs contain things I have learned in that particular week along with the issues I faced on the things I did that week.

Patches on the List

The following are the patches I sent to the List for my GSoC project. NOTE: the summaries of the patches are from the “What’s cooking in git.git” mails sent to the List which contain an enumeration of patches pushed to the mailing list.

set-url:

* ss/submodule-set-url-in-c (2020-05-08) 1 commit
  (merged to 'next' on 2020-05-08 at 93e390eb33)
 + submodule: port subcommand 'set-url' from shell to C

 Rewriting various parts of "git submodule" in C continues.

set-branch:

* ss/submodule-set-branch-in-c (2020-06-02) 1 commit
  (merged to 'next' on 2020-06-18 at 8880b35c5a)
 + submodule: port subcommand 'set-branch' from shell to C

 Rewrite of parts of the scripted "git submodule" Porcelain command
 continues; this time it is "git submodule set-branch" subcommand's
 turn.

summary:

-> * ss/submodule-summary-in-c (2020-08-12) 4 commits
     (merged to 'next' on 2020-08-17 at 9bc352cb70)
 + submodule: port submodule subcommand 'summary' from shell to C
 + t7421: introduce a test script for verifying 'summary' output
 + submodule: rename helper functions to avoid ambiguity
 + submodule: remove extra line feeds between callback struct and macro

 Yet another subcommand of "git submodule" is getting rewritten in C.

t7401:

* ss/t7401-modernize (2020-08-21) 5 commits
 + t7401: add a NEEDSWORK
 + t7401: change indentation for enhanced readability
 + t7401: change syntax of test_i18ncmp calls for clarity
 + t7401: use 'short' instead of 'verify' and cut in rev-parse calls
 + t7401: modernize style

 Test clean-up.

Fixup to summary:

* ss/submodule-summary-in-c-fixes (2020-08-27) 3 commits
 - t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
 - submodule: fix style in function definition
 - submodule: eliminate unused parameters from print_submodule_summary()
 (this branch uses ss/submodule-summary-in-c.)

 Fixups to a topic in 'next'.

add:

* ss/submodule-add-in-c (2020-24-08) 1 commit
   submodule: port submodule subcommand 'add' from shell to C

Work in Progress. Review of v1 is complete.

Post GSoC

I plan on supporting the git submodule add patch series post GSoC as well and make sure that it reaches master. Apart from that, I have a couple of other things in mind:

- Port subcommand `update` to C. Some of the code is
  there, yet, the shell script still exists. This has
  to be looked into.
- Improve the test `t7401` based on the remarks above.
- Add support for Hindi in Git.

Final remarks

I loved my experience with Git and had a chance to talk to many experienced and knowledged people. Working for Git helped me dive even deeper into C, shell scripting, Linux and Computer Science in general. I’d like to thank:

- Christian Couder
- Kaartic Sivaraam

For mentoring me and helping me out with all the problems I had.
And,

- Heba Waly
- Derrick Stolee
- Jakub Narębski

For mentoring in GSoC and thus making the selection of the three of us possible!

- Johannes Schindelin (Dscho)
- Philip Oakley
- Jeff King (Peff)
- Denton Liu
- Taylor Blau
- Eric Sunshine

For commenting on my patches and giving your critiques.
And finally,

- Junio C Hamano

For keeping the project alive!

I hope to keep contributing at Git in some way or the other! Thank you so much for this oppurtunity.

Over and out,
Shourya Shukla

Comments

  1. Hey Shourya, Huge congrats on this major milestone. I'm utterly captivated by the way you organised your project and the bond you had with the codebase maintainers, mentors, and previous contributors, and I'm elated to say that I've learned a lot from this.
    Please Shourya, I hope to contribute to Git during Gsoc 2024, and I believe the array of best practices and advice you've communicated via this post really help, but I want more. I want some direction. My email is sergiusnyah@gmail.com, and I'm ready to chat with you on these from the present moment until the day life separates Git and I. Thank you Shourya, you're sweet!

    ReplyDelete
  2. Besides, I'm bookmarking this. I'd have to read this alongside our conversations every week :)

    ReplyDelete

Post a Comment

Popular posts from this blog

GSoC Week 4 [One month special]

GSoC Week 10