scripts/pythondistdeps: Notes from an attempted rewrite to importlib.metadata

Notes from an attempted rewrite from pkg_resources to importlib.metadata in 2020:
1. While pkg_resources can open a metadata on a specified path
   (Distribution.from_location()), importlib provides access only to
   "installed package metadata", i.e. the the dist-info or egg-info directory
   must be "discoverable", i.e. on the sys.path.
   - Thankfully only the dist/egg-info directory must exist, the
     corresponding Python module does not have to be present.
   - The problems this causes:
     (a) You have to manipulate the sys.path to add the specific location of
         the site-packages directory inside the buildroot
     (b) If you have package "foo" in this newly added directory on sys.path
         and there is some problem and its dist/egg-info metadata are not found,
         importlib.metadata continues searching the sys.path and may discover a
         package with the same name (possibly same version) outside the
         buildroot.
         To get around this, you can manipulate the sys.path to remove all
         other "site-packages" directories. But you have to leave the
         standard library there, because importlib may import other modules
         (in my testing: base64, quopri, random, socket, calendar, uu)
     (c) I have not tested how well it works if you're ispecting metadata of
         different Python versions than the one you run the script with
         (especially Python 2 vs Python 3). This might also cause problems with
         dependency specifiers (i.e. python_version != "3.4")
2. Handling of dependencies (requires) is problematic in importlib.metadata
   - pkg_resources provides a way to separately list standard requires and a
     requires for each "extras" category. importlib does not provide this, it
     only spits out a list of strings, each string in the format:
     - 'packaging>=14',
     - 'towncrier>=18.5.0; extra == "docs"', or
     - 'psutil<6,>=5.6.1; (python_version != "3.4") and extra == "testing"
     you can either parse these with a regex (fragile) or use the external
     `packaging` Python module. `packaging`, however, also doesn't have a great
     support for figuring out extra dependencies, it provides the marker api:
     - <Marker(\'python_version != "3.4" and extra == "testing"\')>
     you can use Marker api to evaluate the condition, but not to parse.
     For parsing you can access the private api Marker._markers:
     - marker._markers=[[(<Variable('python_version')>, <Op('!=')>, \
           <Value('3.4')>)], 'and', (<Variable('extra')>, <Op('==')>, \
           <Value('testing')>)]
     which beyond the problem of being private is also not very useful for
     parsing due to its structure.
   - pkg_resources also provides version parsing, which importlib does not
     and `packaging` needs to be used
   - importlib is part of the standard library, but packaging and its
     2 runtime dependencies (pyparsing and six) are not, and therefore we
     would go from 1 dependency to 3
3. A few minor issues, more in the next section about equivalents.

importlib.metadata.distribution equivalents of pkg_resources.Distribution attributes:
- pkg_resources: dist.py_version
  importlib: # not implemented (but can be guessed from the /usr/lib/pythonXX.YY/ path)
- pkg_resources: dist.project_name
  importlib: dist.metadata['name']
- pkg_resources: dist.key
  importlib: # not implemented
- pkg_resources: dist.version
  importlib: dist.version
- pkg_resources: dist.requires()
  importlib: dist.requires  # but returns strings with almost no parsing done, and also lists extras
- pkg_resources: dist.requires(extras=dist.extras)
  importlib: # not implemented, has to be parsed from dist.requires
- pkg_resources: dist.get_entry_map('console_scripts')
  importlib: [ep for ep in importlib.metadata.entry_points()['console_scripts'] if ep.name == pkg][0]
             # I have not found a better way to get the console_scripts
- pkg_resources: dist.get_entry_map('gui_scripts')
  importlib: # Presumably same as console_scripts, but untested
This commit is contained in:
Tomas Orsava 2020-04-08 18:12:09 +02:00
parent 1639424a51
commit 1634914c2e

View File

@ -11,6 +11,10 @@
# RPM python dependency generator, using .egg-info/.egg-link/.dist-info data
#
# Please know:
# - Notes from an attempted rewrite from pkg_resources to importlib.metadata in
# 2020 can be found in the message of the commit that added this line.
from __future__ import print_function
import argparse
from os.path import basename, dirname, isdir, sep
@ -185,6 +189,9 @@ for f in (args.files or stdin.readlines()):
lower.endswith('.egg-info') or \
lower.endswith('.dist-info'):
# This import is very slow, so only do it if needed
# - Notes from an attempted rewrite from pkg_resources to
# importlib.metadata in 2020 can be found in the message of
# the commit that added this line.
from pkg_resources import Distribution, FileMetadata, PathMetadata, Requirement, parse_version
dist_name = basename(f)
if isdir(f):