Discussion:
broken references in jar manifests
Ricardo Wurmus
2018-03-01 17:11:59 UTC
Permalink
Hi Guix,

we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
class path at run-time, we end up with a manifest like this:

--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---

Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.

Breaking up lines longer than 70 characters is according to the manifest
specification, see
https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html

We cannot just use a wrapper that sets CLASSPATH, because it appears to
be ignored when using a manifest with “Main-Class”. “Main-Class” is
required for executables that can be run like this:

java -jar whatever.jar

A possible work-around might be to patch the class loader of our JDKs to
look at a custom Guix-specific file, which we will include in each jar.
That file would be allowed to have longer lines.

There are two disadvantages:

1) we need to patch the JDK
2) the jars would not do the right thing when executed with a different
JDK (e.g. on a foreign distro).

What do you think?

--
Ricardo
Gábor Boskovits
2018-03-01 18:32:14 UTC
Permalink
Post by Ricardo Wurmus
Hi Guix,
we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
Could we modify the reference scanner to find these lines?
Would there be any serious drawback to that?
Post by Ricardo Wurmus
Breaking up lines longer than 70 characters is according to the manifest
specification, see
https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html
We cannot just use a wrapper that sets CLASSPATH, because it appears to
be ignored when using a manifest with “Main-Class”. “Main-Class” is
java -jar whatever.jar
A possible work-around might be to patch the class loader of our JDKs to
look at a custom Guix-specific file, which we will include in each jar.
That file would be allowed to have longer lines.
1) we need to patch the JDK
2) the jars would not do the right thing when executed with a different
JDK (e.g. on a foreign distro).
What do you think?
--
Ricardo
Ricardo Wurmus
2018-03-01 18:54:59 UTC
Permalink
Post by Gábor Boskovits
Post by Ricardo Wurmus
Hi Guix,
we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
Could we modify the reference scanner to find these lines?
Would there be any serious drawback to that?
The reference scanner needs to be fast. The scheme version is heavily
optimized to do quickly replace references for the purpose of grafting.
The C++ version will eventually be replaced with the Scheme version as
the daemon gets replaced.

An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.

That’s what I did in the package definition for dropseq-tools.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Gábor Boskovits
2018-03-01 19:08:53 UTC
Permalink
Post by Ricardo Wurmus
Post by Gábor Boskovits
Post by Ricardo Wurmus
Hi Guix,
we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
Could we modify the reference scanner to find these lines?
Would there be any serious drawback to that?
The reference scanner needs to be fast. The scheme version is heavily
optimized to do quickly replace references for the purpose of grafting.
The C++ version will eventually be replaced with the Scheme version as
the daemon gets replaced.
An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.
I guess I would prefer to do this instead.
Post by Ricardo Wurmus
That’s what I did in the package definition for dropseq-tools.
Unfortunately I cannot find this package. Where should I look for it?
Post by Ricardo Wurmus
--
Ricardo
GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Ricardo Wurmus
2018-03-01 19:52:03 UTC
Permalink
Post by Gábor Boskovits
Post by Ricardo Wurmus
An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.
I guess I would prefer to do this instead.
The thing is that this requires extra patching for packages that build
their own manifest. Maybe we can detect this in the ant-build-system
and take care of this automatically?
Post by Gábor Boskovits
Post by Ricardo Wurmus
That’s what I did in the package definition for dropseq-tools.
Unfortunately I cannot find this package. Where should I look for it?
It’s fresh! It should have arrived in the guix-patches list archives by
now.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Chris Marusich
2018-03-19 04:24:19 UTC
Permalink
Post by Gábor Boskovits
Post by Ricardo Wurmus
Post by Ricardo Wurmus
Hi Guix,
we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
That is quite unfortunate. :-(
Post by Gábor Boskovits
Post by Ricardo Wurmus
An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.
I guess I would prefer to do this instead.
Sounds good to me, too.
Post by Gábor Boskovits
A possible work-around might be to patch the class loader of our JDKs to
look at a custom Guix-specific file, which we will include in each jar.
That file would be allowed to have longer lines.
1) we need to patch the JDK
2) the jars would not do the right thing when executed with a different
JDK (e.g. on a foreign distro).
We shouldn't be afraid to experiment with a custom class loader. It may
turn out to be an effective way to "teach" Java how to follow store
references, and the class loader machinery is well pretty documented.

The tricks we (and Nix) do with the linker (e.g., ld-wrapper) to always
use rpaths for C programs and libraries is similar in spirit: if the
machine does not know our rules by default, we must teach the machine
how to play by our rules.
--
Chris
Ricardo Wurmus
2018-03-19 04:47:11 UTC
Permalink
Post by Chris Marusich
Post by Gábor Boskovits
Post by Ricardo Wurmus
Post by Ricardo Wurmus
we have a problem with jar manifests. When we use the Class-Path
property to ensure that an executable can find its dependencies on the
--8<---------------cut here---------------start------------->8---
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
--8<---------------cut here---------------end--------------->8---
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
That is quite unfortunate. :-(
Post by Gábor Boskovits
Post by Ricardo Wurmus
An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.
I guess I would prefer to do this instead.
Sounds good to me, too.
Actually, I think this wouldn’t work after all. When the references in
the Class-Path property are split up they would be overlooked when
grafting. The references in the separate custom file would be grafted.
The result is that the applications would keep trying to load the old,
ungrafted jars while Guix would think that the package only retained
references to the grafted packages.

Equally bad: the garbage collector would trash these ungrafted
references as it cannot detect them, thus breaking the application.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Ludovic Courtès
2018-03-02 13:18:09 UTC
Permalink
Heya,
Post by Ricardo Wurmus
Manifest-Version: 1.0
Class-Path: /gnu/store/i28vi94r8z9f0x02zgkrv87w16ibmqkw-java-htsjdk-2.
10.1/share/java/htsjdk.jar
Created-By: 1.8.0_151 (Oracle Corporation)
Main-Class: picard.cmdline.PicardCommandLine
Note that the Class-Path property is broken into two lines. This means
that the reference scanner will miss it and grafting will fail.
Breaking up lines longer than 70 characters is according to the manifest
specification, see
https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html
What would happen if we modified the tool that generates these to no
break lines? Buffer overflow in the class loader?
Post by Ricardo Wurmus
1) we need to patch the JDK
2) the jars would not do the right thing when executed with a different
JDK (e.g. on a foreign distro).
Does #2 really matter? Those jars won’t work on non-Guix systems anyway
precisely because they contain references to the store.
Post by Ricardo Wurmus
An alternative to recording full references in the manifest file is to
install a “lib” directory that contains symlinks to dependencies. The
manifest can then contain relative paths to these symlinks.
Very smart and easy to do, no? To my untrained eye, this looks like the
winning option here. :-)

Thanks,
Ludo’.
Danny Milosavljevic
2018-03-20 10:28:55 UTC
Permalink
(1) How about adding another file to the META-INF directory which is just
like manifest - just without the line breaks?

That could even be automated.

(2) Also, "jar -i" calculates an index for where which package is ("INDEX.LIST").
Is that also hard-word-wrapped? The specification
https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#JAR%20Index
only mentions the 72 Byte limit for "manifest and signature files" and then specifies
Index File Specification separately.

Either would be good enough for the GC retaining the deps, right?
julien lepiller
2018-03-20 10:50:51 UTC
Permalink
Post by Danny Milosavljevic
(1) How about adding another file to the META-INF directory which is just
like manifest - just without the line breaks?
That could even be automated.
(2) Also, "jar -i" calculates an index for where which package is ("INDEX.LIST").
Is that also hard-word-wrapped? The specification
https://docs.oracle.com/javase/7/docs/technotes/guides/jar/jar.html#JAR%20Index
only mentions the 72 Byte limit for "manifest and signature files" and then specifies
Index File Specification separately.
Either would be good enough for the GC retaining the deps, right?
The issue is with grafts here: the plain file with full reference to the
store
gets grafted and contains the new entry. The Manifest will not be
updated because
of line breaks, and Java will still look for the version in the
manifest. The GC
will collect it because it doesn't see any reference to the old version
in any
file (because of line breaks), and the package is broken.
Danny Milosavljevic
2018-03-20 12:14:18 UTC
Permalink
Hi Julien,

On Tue, 20 Mar 2018 11:50:51 +0100
Post by julien lepiller
The issue is with grafts here: the plain file with full reference to the
store
gets grafted and contains the new entry. The Manifest will not be
updated because
of line breaks, and Java will still look for the version in the
manifest. The GC
will collect it because it doesn't see any reference to the old version
in any
file (because of line breaks), and the package is broken.
Oh okay, so there are two problems.

(The specification says that INDEX.LIST is preferred if it exists)
Ricardo Wurmus
2018-03-20 13:47:41 UTC
Permalink
Post by Danny Milosavljevic
Hi Julien,
On Tue, 20 Mar 2018 11:50:51 +0100
Post by julien lepiller
The issue is with grafts here: the plain file with full reference to the
store
gets grafted and contains the new entry. The Manifest will not be
updated because
of line breaks, and Java will still look for the version in the
manifest. The GC
will collect it because it doesn't see any reference to the old version
in any
file (because of line breaks), and the package is broken.
Oh okay, so there are two problems.
Well, they are closely related. One is that we don’t keep references
because they are broken up in the manifest; this is a minor problem,
because we can propagate inputs. The more important problem is that it
breaks grafts.
Post by Danny Milosavljevic
(The specification says that INDEX.LIST is preferred if it exists)
Interesting. It’s worth playing with it.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Danny Milosavljevic
2018-03-21 21:07:31 UTC
Permalink
Post by Danny Milosavljevic
(The specification says that INDEX.LIST is preferred if it exists)
Just tried "jar -i" with java-picard after manually editing the class path to be much longer:

INDEX.LIST contains:

JarIndex-Version: 1.0

picard2.jar
picard
picard/analysis
picard/analysis/artifacts
picard/analysis/directed
picard/analysis/replicates
picard/annotation
picard/cmdline
picard/cmdline/programgroups
picard/fastq
picard/filter
picard/fingerprint
picard/illumina
picard/illumina/parser
picard/illumina/parser/fakers
picard/illumina/parser/readers
picard/illumina/quality
picard/metrics
picard/pedigree
picard/reference
picard/sam
picard/sam/markduplicates
picard/sam/markduplicates/util
picard/sam/testers
picard/sam/util
picard/util
picard/vcf
picard/vcf/MendelianViolations
picard/vcf/filter
picard/vcf/processor
picard/vcf/processor/util

/gnu/store/3gq6vpqplak9aj01ya4gsfr2f02nfv0v-java-htsjdk-2.10.1/share/././././././././././././././././././././././././././././././././././././java/htsjdk.jar
htsjdk
htsjdk/samtools
htsjdk/samtools/apps
htsjdk/samtools/cram
...

No wrapping done anywhere (I added ./././ to the manifest in order to make the line very long).
Ricardo Wurmus
2018-03-21 22:58:20 UTC
Permalink
Hi Danny,
Post by Danny Milosavljevic
Post by Danny Milosavljevic
(The specification says that INDEX.LIST is preferred if it exists)
Just tried "jar -i" with java-picard after manually editing the class
[…]
Post by Danny Milosavljevic
No wrapping done anywhere (I added ./././ to the manifest in order to
make the line very long).
That’s great. Thanks for testing.

I’m still not clear on the effect of the INDEX.LIST file. According to
the specs it sounds like this could be used to set a search path for
classes:

When the classloader loads the root jar file, it reads the
INDEX.LIST file and uses it to construct a hash table of mappings
from file and package names to lists of jar file names. In order to
find a class or a resource, the class loader queries the hashtable
to find the proper jar file and then downloads it if necessary.

If this is really so we could turn all propagated-inputs into regular
inputs for all Java packages (in many cases they already are regular
inputs, but this may actually be a mistake).

We would have to test this in a real-world example, such as the
dropseq-tools package, which currently has an unsightly
“record-references” phase. If it is sufficient to record the Java
package inputs in an INDEX.LIST file, we should do this in the
ant-build-system.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
Danny Milosavljevic
2018-03-25 10:19:18 UTC
Permalink
diff --git a/guix/build/ant-build-system.scm b/guix/build/ant-build-system.scm
index 6ce813a00..d09062625 100644
--- a/guix/build/ant-build-system.scm
+++ b/guix/build/ant-build-system.scm
@@ -172,6 +172,18 @@ to the default GNU unpack strategy."
#:allow-other-keys)
(zero? (apply system* `("ant" ,build-target ,@make-flags))))

+(define* (generate-jar-indices #:key outputs #:allow-other-keys)
+ "Generate file \"META-INF/INDEX.LIST\". This file does not use word wraps
+and is preferred over \"META-INF/MAINFEST.MF\", which does use word wraps,
+by Java when resolving dependencies. So we make sure to create it so that
+grafting works."
+ (define (generate-index jar)
+ (invoke "jar" "-i" jar))
+ (every (match-lambda
+ ((output . directory)
+ (every generate-index (find-files directory "\\.jar$"))))
+ outputs))
+
(define* (strip-jar-timestamps #:key outputs
#:allow-other-keys)
"Unpack all jar archives, reset the timestamp of all contained files, and
@@ -232,7 +244,9 @@ repack them. This is necessary to ensure that archives are reproducible."
(replace 'build build)
(replace 'check check)
(replace 'install install)
- (add-after 'install 'strip-jar-timestamps strip-jar-timestamps)))
+ (add-after 'install 'generate-jar-indices generate-jar-indices)
+ (add-after 'generate-jar-indices 'strip-jar-timestamps
+ strip-jar-timestamps)))

(define* (ant-build #:key inputs (phases %standard-phases)
#:allow-other-keys #:rest args)
Chris Marusich
2018-04-01 22:12:42 UTC
Permalink
Hi Danny,

Thank you for writing a patch! It looks good to me. We will need to
apply this to the core-updates branch, right? I think that changes to
the ant-build-system will cause all packages that use it to be rebuilt.
Post by Danny Milosavljevic
+(define* (generate-jar-indices #:key outputs #:allow-other-keys)
+ "Generate file \"META-INF/INDEX.LIST\". This file does not use word wraps
+and is preferred over \"META-INF/MAINFEST.MF\", which does use word wraps,
+by Java when resolving dependencies. So we make sure to create it so that
+grafting works."
Is that the only reason? My understanding is that we want to generate
the JAR indices not only (1) to ensure that grafting will work properly,
but also (2) to ensure that the reference scanner will find all the
store paths produced in the output JAR. Goal (2) is important because
if a store path is used at run-time by the installed software, but the
path is treated as "dead" by the garbage collector, then the path may be
removed during garbage collection, which would break the software. If
you agree, then could you update the docstring with a little more info?
Post by Danny Milosavljevic
+ (define (generate-index jar)
+ (invoke "jar" "-i" jar))
+ (every (match-lambda
+ ((output . directory)
+ (every generate-index (find-files directory "\\.jar$"))))
+ outputs))
Does the jar program find the classes here via the CLASSPATH environment
variable, which was set earlier during the configure phase by the
generate-classpath procedure?
--
Chris
Danny Milosavljevic
2018-04-18 13:25:43 UTC
Permalink
Hi Chris,

On Mon, 02 Apr 2018 00:12:42 +0200
Post by Chris Marusich
Is that the only reason? My understanding is that we want to generate
the JAR indices not only (1) to ensure that grafting will work properly,
but also (2) to ensure that the reference scanner will find all the
store paths produced in the output JAR.
Indeed. I'll update the comment.
Post by Chris Marusich
Post by Danny Milosavljevic
+ (define (generate-index jar)
+ (invoke "jar" "-i" jar))
+ (every (match-lambda
+ ((output . directory)
+ (every generate-index (find-files directory "\\.jar$"))))
+ outputs))
Does the jar program find the classes here via the CLASSPATH environment
variable, which was set earlier during the configure phase by the
generate-classpath procedure?
I don't know. I assumed that it would unpack the jar file again and fiddle with
it in a mostly self-contained way as far as environment variables go.

I'm trying to (setenv "CLASSPATH" #f) and am rebuilding the java packages now...
let's see...
Danny Milosavljevic
2018-05-06 18:09:39 UTC
Permalink
Hi Chris,

On Mon, 02 Apr 2018 00:12:42 +0200
Post by Chris Marusich
Thank you for writing a patch! It looks good to me. We will need to
apply this to the core-updates branch, right? I think that changes to
the ant-build-system will cause all packages that use it to be rebuilt.
I've pushed the patch to core-updates.

I've created (and closed) bug# 31374 in order for us to be able to track it somehow.
Post by Chris Marusich
Does the jar program find the classes here via the CLASSPATH environment
variable, which was set earlier during the configure phase by the
generate-classpath procedure?
I still don't know - I seem to have lost the local clone along the way.

It would be good to know it still. It should be possible to unset the
CLASSPATH environment variable even before installing.

On the other hand, I checked that dependencies listed inside MANIFEST.MF are picked up by
the INDEX.LIST generator by manually editing a JAR file.

(Commenting in this thread so we can track the status)

Loading...