GSoC Week 12

Submitting the patch

This week I finally submitted a patch onto the List, though it wasn’t the only one I plan on sending to the List. In the last blog, I talked about the git log issue, which is fixed thankfully! There were some other problems which came up, but they are sorted now. The work on summary did reveal some issues in the test script t7401 as well—something which prompted me to create an entirely new test script. I also learnt about regular expressions, so that is another thing I will touch in this blog.

The issue with t7401

The test t7401-submodule-summary.sh was written a long time ago, and the command git submodule add did not exist back then. Hence the submodules are added using git add in the test script. This leads to some unexpected behaviour when trying to run commands like git submodule init and git submodule deinit.

I came across this issue while trying to add a test to verify the summary output of a deinitialised submodule. The test I wrote:

test_expect_success 'should not print anything in case of a deinitialised submodule' "
git submodule deinit sm1 &&
git submodule summary >output1 2>&1 &&
git submodule init sm1 &&
git submodule summary >output2 &&
cat >expected <<-EOF &&
* sm1 0000000...$head1 (2):
  > Add foo2

EOF
test_cmp expected output2 &&
test_must_be_empty output1
"

Now when I tried to deinitialise the SM in git submodule deinit, I did not receive any warnings or success prompts or anything related to the deinitailisation of the submodule. I decided to comment out that line and then see the effect, to which I noticed that the .gitmodules does not exist since init printed an error regarding the same.

This prompted a need to create a new test file which adds submodules using git submodule add so that we can conduct the niche tests we hoped to create. Mind you, my C port of summary worked just fine with t7401, but it was failing only for the cases I described.

So, I created a new test, called t7421-submodule-summary-add.sh (christened by Christian). The discussion may be viewed here.

t7421-submodule-summary-add.sh

The test script, as the name suggests, adds submodules using git submodule add and performs some tests on them. Christian and Kaartic suggested me not to repeat the tests mentioned in t7401 since our port works completely fine for them and it will lead to redundancy. The script consists of the following 4 tests:

  1. summary test environment setup - Set up a test environment with a submodule and a superproject which adds the submodule.
  2. summary output for initialised submodules - This test creates an another commit in the submodule and then computes the summary based on that and thereafter verifies it.
  3. summary output for deinitialised submodules - This test deinitialises the submodule, checks if the output is empty or not and then reinitialises the submodule to see if the output is same as the one in the previous test or not.
  4. summary output for submodules with changed paths - This test changes the path of the submodule in the superproject and then checks the error message prompted as well as the output we received. To verify the error message, I do a grep of the submodule’s path instead of comparing the full message since the message may vary from machine to machine and may, therefore, crash on certain Linux distributions; this was advised to me by Kaartic and Christian.

This thus marks the end of the test t7421. It passes the CI build, and all the tests pass with my port as well.

Regular expressions

Regular Expression is a fascinating thing I learned this week. I was aware of the term and its short form “regex” but I never really knew what they were used for. After learning about them, I realised that I did kind of use this thing when I was into Information Security.

Regex is a sequence of characters with which we search for a particular pattern of expressions. Different characters hold different meanings and have their own significance. Some commonly used characters are (source):

.       - Any Character Except New Line
\d      - Digit (0-9)
\D      - Not a Digit (0-9)
\w      - Word Character (a-z, A-Z, 0-9, _)
\W      - Not a Word Character
\s      - Whitespace (space, tab, newline)
\S      - Not Whitespace (space, tab, newline)

\b      - Word Boundary
\B      - Not a Word Boundary
^       - Beginning of a String
$       - End of a String

[]      - Matches Characters in brackets
[^ ]    - Matches Characters NOT in brackets
|       - Either Or
( )     - Group

Quantifiers:
*       - 0 or More
+       - 1 or More
?       - 0 or One
{3}     - Exact Number
{3,4}   - Range of Numbers (Minimum, Maximum) 

An simple example of searching using regex can be:

TEXT:
	Mr. Norman
	Mr Gordon
	Mr. Thomas123
SEARCH: Mr\.? [a-zA-Z]+

This expression will help us pick out all valid names in the above text. The way to read the above search pattern is as follows:

Mr => The abbreviation for Mister, followed by
\. => Escape character for a (.)dot
?  => Makes the character preceeding it to be optional
	  therefore the dot after "Mr" is optional, followed by
" " => The usual space after <Mr>, followed by
[a-zA-Z]  => Any Latin Alphabet, be it uppercase
			 or lowercase
+ => Tells that the character preceeding it (i.e.,
	 any alphabet), can occur one or more times

Therefore, this search expression will shortlist the first two names and not the second one since it has a number in it.

If this were a long list of names (and I am talking of the order 10^5 names) then searching out the correct entries of a name would have been very tough since any prankster could enter a faulty name like “Mr. Thomas123” and thus even break the name parser. Creating a regex for such a case allows us to automate the work and reduce the workload by a huge extent! There can be even more powerful regex but they are out of scope of this blog since I am not here to teach regex. I learnt about regex from here.

Next Steps

Now I have to revamp the commit message of the commit porting summary and the patch will be ready to push to the List. This is the patch I submitted to the List.
The second evaluations are approaching as well. I hope they go good!

Over and out,
Shourya Shukla

Comments

Popular posts from this blog

The Final Report

GSoC Week 15

GSoC Week 7