#554682 bundle --stdin doesn't work

Package:
git
Source:
git
Description:
fast, scalable, distributed revision control system
Submitter:
Joey Hess
Date:
2017-10-23 12:48:05 UTC
Severity:
important
#554682#5
Date:
2009-11-05 22:20:30 UTC
From:
To:
I do not see this crash with a git-bundle built from current git head.

joey@gnu:~/tmp>mkdir new
joey@gnu:~/tmp>cd new
joey@gnu:~/tmp/new>git init

Initialized empty Git repository in /home/joey/tmp/new/.git/
joey@gnu:~/tmp/new>touch file
joey@gnu:~/tmp/new>git add file
joey@gnu:~/tmp/new>git commit -m add
[master (root-commit) 84fc0dd] add
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file
joey@gnu:~/tmp/new>echo master | git bundle create ../my.bundle --stdin
zsh: segmentation fault  git bundle create ../my.bundle --stdin

#554682#10
Date:
2009-11-30 13:22:18 UTC
From:
To:
found 554682 git-core/1:1.6.5.3-1
tags 554682 + upstream
thanks

Joey Hess wrote:

Odd.  I can reproduce this with current master, too.

I’ve prepared a test and a fix for the segfault, but they are not
enough to get 'git bundle create bundle --stdin' to actually work.  I
guess I’ll have to actually read through the bundle code.

Jonathan

#554682#19
Date:
2010-01-19 00:26:42 UTC
From:
To:
joey@gnu:~/tmp/new>echo master | git bundle create ../my.bundle --stdin
zsh: segmentation fault  git bundle create ../my.bundle --stdin

I noticed that git bundle --stdin actually attempts to read from stdin
past EOF. You can see this if you manually type into its stdin.

% git-bundle create ../bundle --stdin
master
^D
master
^D
fatal: Refusing to create empty bundle.

The first stdin read is done by the internal call to rev-list --stdin.
The second stdin read is done by the call to setup_revisions(),
which has its own handler for --stdin.

Bug seen with git version 1.6.5.7 / 1.6.6.243.gff6d2

I also tried going back to 22568f0a336ac37ae7329c917857b455839d1d09, but
still see a bug with Adam Brewster's initial code to add --stdin to
git-bundle. That code still tries to read stdin twice. If it sees
"master" both times, it does create a bundle.

#554682#24
Date:
2010-01-19 23:52:51 UTC
From:
To:
Hi,

Current 'next' fails, too.

Some previous Git versions failed with a message like this:

error: unrecognized argument: --stdin'

Other previous Git versions failed at least with a message such as this:

fatal: exec rev-list failed.
error: rev-list died

Something similar happens to me when I run the initial official revision
of git-bundle. The reason is that "git rev-list --boundary
--pretty=oneline --stdin" refuses to run.

The bad versions either segfault, or "refuse to create empty bundles".

And while 8b3dce5 purports to clean things up (even acknowledging that
support code for --stdin was removed from bundle.c!), at that
time git-bundle was obviously not tested/fixed.

Now, I invested a lot of time into the new Git wiki, and into trying to
bisect this (it took many, many more steps than the suggested 13, and
somewhere in between, the number of commits to be tested even increased!).

If you want to fix it, I suggest requiring --stdin to be the only
parameter after the bundle file name, and adding a function using
strbuf_getline() to parse the stdin into a string_list.  Once that is
done, you can substitute the argv for the rev-list call with that list
(for that, you need to prepopulate with "rev-list", "--boundary" and
"--pretty=oneline").  You can reuse that list for the call to
setup_revisions().

Alternatively, you can try to implement the rev-list --boundary by hand
(the --pretty=oneline is only needed to get a boundary marker IIRC),
taking care to reset the commit flags that were set in the process.  (We
need to know the boundary commits before actually starting to write the
pack, because the bundle file format dictates that the boundary commits
are listed as prerequsites in the bundle header.)

If you want to go that route (which is arguably more elegant anyway), I
suggest having a look at the merge_bases() and get_merge_bases() functions
in commit.c, which do something similar (i.e. a revwalk without using
revision.c's functions -- because you cannot tell what flags they will use
in the future, and they have to be reset after the walk).

Ciao,
Dscho

#554682#45
Date:
2010-06-26 06:17:35 UTC
From:
To:
Joey Hess wrote:
to introduce a memory leak for all users of --stdin to make
‘bundle --stdin’ actually work (patch 6).

In the case of bundle --stdin, I think the leak is unavoidable, while
in general, I suspect a parameter is needed to allow the caller to
indicate whether the leak is needed (analogous to save_commit_buffer).

The rest of the patches should be comparatively pleasant.

Patches 1, 3, and 4 give various parts of bundle creation their own
functions, to make the code easier to digest;

Patch 2 teaches the basis discovery code (which currently forks a
rev-list --boundary process) to use the revision walker directly, to
prepare for patch 5.

Patch 5 looked like it would be the main point of the series: it saves
the list of revs including those read from stdin and resets the
revision walker after the boundary is found.  This way, stdin is only
read once and the underlying logic of the revision walk does not need
to be changed significantly.

Even with patch 5, bundle --stdin still finds no revisions to read.
The problem: to write the table of contents, the name passed on the
command line for each ref is needed.  Unfortunately, names passed
through stdin are kept in a temporary buffer and then freed; the
random gibberish read instead does not look like a meaningful ref
name, so no valid toc entries are found.

Patch 6 fixes this, at the cost of a memory leak.  Suggestions for
avoiding the leak would of course be welcome.

Patch 7 adopts a similar “fix” in a context where it is not needed.
This is just for illustration.

Patch 8 is an unrelated enhancement that came along the way.

Thanks to Joey for the report and Johannes for suggestions for fixing
it.

Thoughts?  Tests to add?
Jonathan Nieder (8):
  bundle: split basis discovery into its own function
  bundle: use libified rev-list --boundary
  bundle: split body of list_prerequisites() loop into its own function
  bundle: split table of contents output into its own function
  bundle: reuse setup_revisions result
  revision: Keep ref names after reading them from stdin
  bundle: Keep names of basis refs after discovery
  bundle_create: Do not exit when given no revs to bundle

 bundle.c          |  172 ++++++++++++++++++++++++++++++++++-------------------
 revision.c        |    2 +-
 t/t5704-bundle.sh |    4 +-
 3 files changed, 113 insertions(+), 65 deletions(-)

#554682#50
Date:
2010-06-26 06:19:04 UTC
From:
To:
create_bundle() is getting a little long.  Isolate a small
piece to work on without distractions.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 bundle.c |   58 ++++++++++++++++++++++++++++++++++------------------------
 1 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/bundle.c b/bundle.c
index ff97adc..66948f4 100644
--- a/bundle.c
+++ b/bundle.c
@@ -193,35 +193,14 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
 		(revs->min_age == -1 || revs->min_age > date);
 }

#554682#55
Date:
2010-06-26 06:20:05 UTC
From:
To:
The revision walker produces structured output, which should be a
little easier to work with than the text from rev-list.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 bundle.c |   69 ++++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 38 insertions(+), 31 deletions(-)

diff --git a/bundle.c b/bundle.c
index 66948f4..0dd2acb 100644
--- a/bundle.c
+++ b/bundle.c
@@ -196,40 +196,47 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
 static int list_prerequisites(int bundle_fd, struct rev_info *revs,
 		int argc, const char * const *argv)
 {
-	const char **argv_boundary = xmalloc((argc + 4) * sizeof(const char *));
-	char buffer[1024];
-	struct child_process rls;
-	FILE *rls_fout;
+	const char **argv_boundary = xmalloc((argc + 1) * sizeof(const char *));
+	struct rev_info boundary_revs;
+	struct commit *rev;

-	memcpy(argv_boundary + 3, argv + 1, argc * sizeof(const char *));
-	argv_boundary[0] = "rev-list";
-	argv_boundary[1] = "--boundary";
-	argv_boundary[2] = "--pretty=oneline";
-	argv_boundary[argc + 2] = NULL;
-	memset(&rls, 0, sizeof(rls));
-	rls.argv = argv_boundary;
-	rls.out = -1;
-	rls.git_cmd = 1;
-	if (start_command(&rls))
-		return -1;
-	rls_fout = xfdopen(rls.out, "r");
-	while (fgets(buffer, sizeof(buffer), rls_fout)) {
-		unsigned char sha1[20];
-		if (buffer[0] == '-') {
-			write_or_die(bundle_fd, buffer, strlen(buffer));
-			if (!get_sha1_hex(buffer + 1, sha1)) {
-				struct object *object = parse_object(sha1);
-				object->flags |= UNINTERESTING;
-				add_pending_object(revs, object, buffer);
-			}
-		} else if (!get_sha1_hex(buffer, sha1)) {
-			struct object *object = parse_object(sha1);
-			object->flags |= SHOWN;
+	memcpy(argv_boundary, argv, (argc + 1) * sizeof(const char *));
+
+	init_revisions(&boundary_revs, NULL);
+	boundary_revs.boundary = 1;
+	if (setup_revisions(argc, argv_boundary, &boundary_revs, NULL) > 1)
+		return error("unrecognized argument: %s", argv_boundary[1]);
+	if (prepare_revision_walk(&boundary_revs))
+		return error("revision walk setup failed");
+
+	while ((rev = get_revision(&boundary_revs))) {
+		if (rev->object.flags & BOUNDARY) {
+			struct strbuf buf = STRBUF_INIT;
+			struct pretty_print_context ctx = {0};
+			enum object_type type;
+			unsigned long size;
+
+			/*
+			 * The commit buffer is needed
+			 * to pretty-print boundary commits.
+			 */
+			rev->buffer = read_sha1_file(rev->object.sha1,
+							&type, &size);
+
+			strbuf_addch(&buf, '-');
+			strbuf_add(&buf, sha1_to_hex(rev->object.sha1), 40);
+			strbuf_addch(&buf, ' ');
+			pretty_print_commit(CMIT_FMT_ONELINE, rev, &buf, &ctx);
+			strbuf_addch(&buf, '\n');
+			write_or_die(bundle_fd, buf.buf, buf.len);
+
+			rev->object.flags |= UNINTERESTING;
+			add_pending_object(revs, &rev->object, buf.buf);
+			strbuf_release(&buf);
+		} else {
+			rev->object.flags |= SHOWN;
 		}
 	}
-	fclose(rls_fout);
-	if (finish_command(&rls))
-		return error("rev-list died");
 	return 0;
 }

#554682#60
Date:
2010-06-26 06:20:47 UTC
From:
To:
No functional change intended.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 bundle.c |   57 +++++++++++++++++++++++++++++++--------------------------
 1 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/bundle.c b/bundle.c
index 0dd2acb..e90b5c5 100644
--- a/bundle.c
+++ b/bundle.c
@@ -193,6 +193,33 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
 		(revs->min_age == -1 || revs->min_age > date);
 }

+static void list_prerequisite(int bundle_fd, struct rev_info *revs,
+		struct commit *rev)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct pretty_print_context ctx = {0};
+	enum object_type type;
+	unsigned long size;
+
+	/*
+	 * The commit buffer is needed
+	 * to pretty-print boundary commits.
+	 */
+	rev->buffer = read_sha1_file(rev->object.sha1, &type, &size);
+
+	strbuf_addch(&buf, '-');
+	strbuf_add(&buf, sha1_to_hex(rev->object.sha1), 40);
+	strbuf_addch(&buf, ' ');
+	pretty_print_commit(CMIT_FMT_ONELINE, rev, &buf, &ctx);
+	strbuf_addch(&buf, '\n');
+
+	write_or_die(bundle_fd, buf.buf, buf.len);
+
+	rev->object.flags |= UNINTERESTING;
+	add_pending_object(revs, &rev->object, buf.buf);
+	strbuf_release(&buf);
+}
+
 static int list_prerequisites(int bundle_fd, struct rev_info *revs,
 		int argc, const char * const *argv)
 {
@@ -209,33 +236,11 @@ static int list_prerequisites(int bundle_fd, struct rev_info *revs,
 	if (prepare_revision_walk(&boundary_revs))
 		return error("revision walk setup failed");

-	while ((rev = get_revision(&boundary_revs))) {
-		if (rev->object.flags & BOUNDARY) {
-			struct strbuf buf = STRBUF_INIT;
-			struct pretty_print_context ctx = {0};
-			enum object_type type;
-			unsigned long size;
-
-			/*
-			 * The commit buffer is needed
-			 * to pretty-print boundary commits.
-			 */
-			rev->buffer = read_sha1_file(rev->object.sha1,
-							&type, &size);
-
-			strbuf_addch(&buf, '-');
-			strbuf_add(&buf, sha1_to_hex(rev->object.sha1), 40);
-			strbuf_addch(&buf, ' ');
-			pretty_print_commit(CMIT_FMT_ONELINE, rev, &buf, &ctx);
-			strbuf_addch(&buf, '\n');
-			write_or_die(bundle_fd, buf.buf, buf.len);
-
-			rev->object.flags |= UNINTERESTING;
-			add_pending_object(revs, &rev->object, buf.buf);
-			strbuf_release(&buf);
-		} else {
+	while ((rev = get_revision(revs))) {
+		if (rev->object.flags & BOUNDARY)
+			list_prerequisite(bundle_fd, revs, rev);
+		else
 			rev->object.flags |= SHOWN;
-		}
 	}
 	return 0;
 }

#554682#65
Date:
2010-06-26 06:21:16 UTC
From:
To:
Isolate another piece of create_bundle() to work on.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 bundle.c |   78 +++++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/bundle.c b/bundle.c
index e90b5c5..8ba6479 100644
--- a/bundle.c
+++ b/bundle.c
@@ -245,45 +245,20 @@ static int list_prerequisites(int bundle_fd, struct rev_info *revs,
 	return 0;
 }

#554682#70
Date:
2010-06-26 06:22:12 UTC
From:
To:
Avoid reading stdin twice for bundle --stdin.

Reported-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 bundle.c |   57 +++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/bundle.c b/bundle.c
index 8ba6479..311c554 100644
--- a/bundle.c
+++ b/bundle.c
@@ -193,7 +193,17 @@ static int is_tag_in_date_range(struct object *tag, struct rev_info *revs)
 		(revs->min_age == -1 || revs->min_age > date);
 }

#554682#75
Date:
2010-06-26 06:28:11 UTC
From:
To:
To write a bundle’s table of contents, the name passed on the command
line for each ref is needed.  Unfortunately, names passed through
stdin are put in a temporary buffer and then freed; the random
gibberish that tends to replace them does not look like a meaningful
ref name to bundle_list_refs(), so no valid toc entries are found and
‘git bundle --stdin’ thinks it has been asked to create an empty
bundle.

So teach the revision walker to keep rev names after reading them from
stdin.  This fixes ‘git bundle --stdin’ at the cost of a memory leak.

Reported-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 revision.c        |    2 +-
 t/t5704-bundle.sh |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/revision.c b/revision.c
index f4b8b38..cf86af3 100644
--- a/revision.c
+++ b/revision.c
@@ -1022,7 +1022,7 @@ static void read_revisions_from_stdin(struct rev_info *revs, const char ***prune
 			}
 			die("options not supported in --stdin mode");
 		}
-		if (handle_revision_arg(sb.buf, revs, 0, 1))
+		if (handle_revision_arg(xstrdup(sb.buf), revs, 0, 1))
 			die("bad revision '%s'", sb.buf);
 	}
 	if (seen_dashdash)
diff --git a/t/t5704-bundle.sh b/t/t5704-bundle.sh
index ddc3dc5..cc463f3 100755
--- a/t/t5704-bundle.sh
+++ b/t/t5704-bundle.sh
@@ -30,7 +30,7 @@ test_expect_success 'tags can be excluded by rev-list options' '

 '

#554682#80
Date:
2010-06-26 06:29:18 UTC
From:
To:
Without this change, attempts to examine revs->pending.objects[i].name
when debugging produce random giberish.  On the other hand, it
introduces a small per-basis-ref memory leak.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
For illustration.

 bundle.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/bundle.c b/bundle.c
index 311c554..7aff369 100644
--- a/bundle.c
+++ b/bundle.c
@@ -226,8 +226,7 @@ static void list_prerequisite(int bundle_fd, struct object_array *pending,
 	write_or_die(bundle_fd, buf.buf, buf.len);

 	rev->object.flags |= UNINTERESTING;
-	add_object_array(&rev->object, buf.buf, pending);
-	strbuf_release(&buf);
+	add_object_array(&rev->object, strbuf_detach(&buf, NULL), pending);
 }

 static int list_prerequisites(int bundle_fd, struct rev_info *revs,

#554682#85
Date:
2010-06-26 06:31:58 UTC
From:
To:
Return an error instead to help with the libification effort.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
That’s the end of the series.  Thanks for reading.

I was a bit torn about whether to present this as a request for
comment or a patch series ready for application.  On one hand it fixes
a bug; on the other hand, I have very little confidence that it works
well in the presence of arbitrary rev-list options.  Thoughts and
testing would be very welcome.

Good night,
Jonathan

 bundle.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bundle.c b/bundle.c
index 7aff369..7dd3f65 100644
--- a/bundle.c
+++ b/bundle.c
@@ -349,7 +349,7 @@ static int bundle_list_refs(int bundle_fd, struct rev_info *revs)
 		free(ref);
 	}
 	if (!ref_count)
-		die("Refusing to create empty bundle.");
+		return error("Refusing to create empty bundle.");
 	return 0;
 }

#554682#90
Date:
2010-06-30 17:57:10 UTC
From:
To:
Jonathan Nieder <jrnieder@gmail.com> writes:

Hmm, doesn't it negatively affect later traversal you would need to do if
you smudged the flag bits by running revision traversal like this?

#554682#95
Date:
2010-06-30 18:04:48 UTC
From:
To:
Jonathan Nieder <jrnieder@gmail.com> writes:

You used to walk boundary_revs but now you walk revs that is given by the
caller, exhausting the revs.pending the caller wanted to use later to feed
pack_objects with?

Confused...

#554682#100
Date:
2010-06-30 20:34:51 UTC
From:
To:
Junio C Hamano wrote:

I imagine so.  I fear this would be the first git command to use the
revision walker twice, and I am not sure whether we can really make that
work.

The revision walker uses object flags for the following purposes:

 - marking objects uninteresting/SYMMETRIC_LEFT.  Luckily for us, if
   an object is uninteresting or SYMMETRIC_LEFT for the first
   --boundary walk, it will be likewise for pack-objects, too.

 - history simplification (TREESAME), --cherry-pick (SHOWN),
   --merge (SYMMETRIC_LEFT).  There are already other reasons to
   disallow these features for bundle.

 - add_parents_to_list (ADDED, SEEN).  This one is really worrisome;
   should we walk through again to throw away the added parents?
   Should there be a pass through all revisions to clear the ADDED
   bit?

I’ll figure out a one-pass solution. :(

Jonathan

#554682#105
Date:
2010-06-30 20:37:59 UTC
From:
To:
Junio C Hamano wrote:

Agh!  Typo.  Thanks for catching it.

#554682#110
Date:
2017-10-23 12:43:59 UTC
From:
To:
Good morning,
I hope you are doing great today . My name is Lawson Banku.Kindly
confirm the receipt of my previous letter to you.
Lawson.