GSoC Week 10
Addressing comments on my patch
I spent this week addressing the suggestions made by Johannes which, were extremely detailed and long. I am done with most of the suggestions so I will highlight some important ones and what I learned from them.
This week, I saw the ocean for I guess the first time in my life and it was a totally unique experience!
Peaceful Seas
Moving to the comments made by Johannes.
Eliminate find_unique_abbrev()
The shell version of summary
calls git rev-parse
twice for two jobs which are:
-
Checking for a
missing_src
and amissing_dst
i.e., the missing source and destination committish.test $mod_src = 160000 && ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_src^0 >/dev/null && missing_src=t test $mod_dst = 160000 && ! GIT_DIR="$name/.git" git rev-parse -q --verify $sha1_dst^0 >/dev/null && missing_dst=t
Quoting my mentor Kaartic:
The above code assigns
missing_src
andmissing_dst
based on the
exit status ofrev-parse
only when they correspond to valid
submodules (thetest $mod_src = 160000
andtest $mod_dst = 160000
checks). -
Finding abbreviated committishes (7 characters long) for outputting them later on in the
summary
output.sha1_abbr_src=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_src 2>/dev/null || echo $sha1_src | cut -c1-7) sha1_abbr_dst=$(GIT_DIR="$name/.git" git rev-parse --short $sha1_dst 2>/dev/null || echo $sha1_dst | cut -c1-7)
Quoting my mentor Kaartic:
This gets the abbreviated hash using
rev-parse
without caring about
whether they are submodules. It falls back to abbreviating
sha1_{src,dst}
ifrev-parse
fails. Note however that we don’t
verify if thesrc
ordst
are valid submodules are not.
Since we try to be very frugal with spawning processes and allocating memory in C, aiming to short-circuit as many if
-statements and loops as possible in a fashion which does not hamper the working of the program, we here aim to combine these two spawns of a child process into one. This will make our program more efficient and lighter but the job isn’t an easy one because of a (twisted?) nature of summary
's code i.e., the cmd_summary()
function in shell.
Originally, I had implemented the latter rev-parse
in the form of find_unique_abbrev()
, which at first sight seems good since we eliminated an extra child no? Sadly, that isn’t the case as this function is called by rev-parse
itself and to find a --short=n
version of the hash we give it (n
is passed as argument). Due to the availability of a function which does this job (more on that below), there was the need to eliminate this function. Also, we aim to look for the abbreviation in the submodule and not in the superproject, thus adding another reason not to use the function.
Therefore find_unique_abbrev()
isn’t the best option hence I modified the verify_submodule_committish()
function to do a rev-parse
with short=7
hence fulfilling the purpose of outputting a shortened hash and return the new short hash if it is verified else return NULL
. But as you notice in case (2), even if rev-parse
failed to give a hash (i.e., we got a NULL
), we did a echo $sha1_src | cut -c1-7
meaning sha1_abbr_src
never really stays empty.
To combat this, one way was to do this in generate_submodule_summary()
:
if (S_ISGITLINK(p->mod_src))
src_abbrev = verify_submodule_committish(p->sm_path, oid_to_hex(&p->oid_src));
if (!S_ISGITLINK(p->mod_src) || !src_abbrev)
src_abbrev = xstrndup(oid_to_hex(&p->oid_src), 7);
if (S_ISGITLINK(p->mod_src) && !(src_abbrev = verify_submodule_committish(p->sm_path,oid_to_hex(&p->oid_src))))
src_abbrev = NULL;
So even here the hash remained NULL at the end. But this created another roadblock, which is that there will be a case when the commit will be non-existent or invalid (for example I pass a commit which isn’t actually fetchable), in that we want to warn the user that this hash isn’t really fetchable, but in our case that warning won’t happen since we treat this case differently by making the hash NULL at the end hence not eligible for a correct warning. The warning is not what we are aiming for but rather is my method to explain this; the final hurdle to solve this part was to solve test 14 of t7401
which involved a non-existent commit hence the warning.
So to counter this, a new solution is to do this:
char *src_committish = xstrndup(oid_to_hex(&p->oid_src), 7);
char *dst_committish = xstrndup(oid_to_hex(&p->oid_dst), 7);
At the start of print_submodule_summary()
. What this does is sieve out the faulty cases but also not keep their hashes NULL
by giving them the hash they were. The src_abbrev
and dst_abbrev
variables act as gatekeepers which will help the function differentiate between the right and faulty cases since for the faulty case the hash will be null anyway irrespective of the value of src_committish
or dst_committish
.
TL;DR: We create control variables to be able to differentiate between the right and faulty scenario while retaining the hashes. Earlier we were not retaining those hashes and trying to keep these control variables as hashes and gatekeepers at the same time which clearly isn’t possible. Therefore, we will have to make a compromise at some point or the other.
NOTE: My mentor Kaartic also had another method (which inspired my method) in which the control variables were actually two integers missing_src
and missing_dst
, thus eliminating the additional char*
definitions I created at the end. I personally find his solution better since we will need a control
variable at the end anyway thus contradicting Dscho’s suggestion.
So why not keep it an integer and make things even more simpler? We are waiting for Christian’s analysis on this.
Remove the occurences of is_sm_git_dir
Johannes suggested that the prepare_submodule_repo_env()
in verify_submodule_committish()
will set the GIT_DIR
to the .git/
of the submodule and since our working directory will be the superproject anyway, the whole point of checking if we are in a GIT_DIR
or even checking if it is a bare-repository or not is totally redundant. Therefore, it was better to remove the variable and hence its occurences everywhere thereby saving tons of extra checks and saving one whole indentation level in generate_submodule_summary()
. So, I removed the variable and things did get simplified.
Simplify the range
computation
The range
is actually the range of commits for which we will compute a git log --oneline
. To be more explicit, in the output of running a git submodule summary
:
$ git submodule summary 7c1532cab3125ae24f14ee9433b3c673f2964ef1
* bash 4f90705...f10c25c (4):
> Git: add config for hiding prefix shown in diff
> tools: add cron script to automatically build,install git(-next)
> tools: add script to check conflicts in sync(thing) folders
> Git: use the experimental commit graph feature
The range is 4f90705...f10c25c
. Earlier, we had a range
variable holding %s...%s
where %s
is the source hash and destination hash. The whole system to compute range was overly explicit:
if (S_ISGITLINK(p->mod_src) && S_ISGITLINK(p->mod_dst))
range = xstrfmt("%s...%s", oid_to_hex(&p->oid_src),
oid_to_hex(&p->oid_dst));
else if (S_ISGITLINK(p->mod_src))
range = xstrdup(oid_to_hex(&p->oid_src));
else
range = xstrdup(oid_to_hex(&p->oid_dst));
Just at a glance one can see that the last two conditionals can be combined into one. The more shorter route is to compute the range and store it directly into the args
of the child process:
if (IS_GITLINK(p->mod_src) && IS_GITLINK(p->mod_dst))
argv_array_pushf(&cp_rev_list.args, "%s...%s",
oid_to_hex(&p->oid_src),
oid_to_hex(&p->oid_dst));
else
argv_array_push(&cp_rev_list.args,
IS_GITLINK(p->mod_src) ?
oid_to_hex(&p->oid_src) :
oid_to_hex(&p->oid_dst));
And hence execute this directly using a capture_command()
or a run_command()
whatever pleases you. This leads to my solution.
Some other suggestions
Some other suggestions from Johannes were the following:
-
Improve the
for-status
block since it had some repetitions. This was achieved by checking for the existence of the submodule just at the start of theif
-statements and combining three checks in the firstif
-statement as follows:if (info->for_status && p->status != 'A' && (sub = submodule_from_path(the_repository, &null_oid, p->sm_path)))
Thereby freeing us of one extra indentation level as well.
-
Use
is_submodule_active()
at the end ofprepare_submodule_summary()
instead of spawning a child process. This is a very good suggestions but it is failing for the case when there is not.gitmodules
such as that in the optionfor-status
because this function accesses the.gitmodules
to check for thesubmodule.name.active
configuration. I tried to make a makeshift (it took me long to find this word last night, it has a very good equivalent in Hindi and that is what I was aware of :P) version in which I spawned a process in caseinfo->for_status
was true else did ais_submodule_active()
. This version is better than the way it was originally but it obviously can be made better! -
Compute the error messages in
generate_submodule_summary()
and pass them toprint_submodule_summary()
as achar *
. Earlier we were computing them in the latter but doing this in the former seems more sensible sinceprint_submodule_summary()
should just print what is given to it. I fixed this and it works.
There is other minor stuff too such as refining the commit messages, improving the space and indentations, updating Stefan Beller’s email in the commit messages, etc.
Next steps
The next steps are to solve the remaining 2-3 issues and submit this patch as soon as possible. Some stuff requires guidance from Christian and Kaartic as well. Let’s keep going!
Over and out,
Shourya Shukla
Comments
Post a Comment