Comparisons of pipenv, pip-tools and poetry

This post compares ways we can import packages to Python. Our old friend, the pip and the requirements.txt duo works fine until it doesn’t, so new packaging tools address the dependency management of package import as well. This post compares pipenv, pip-tools and poetry

What problems can arise when using pip and requirements.txt?

To understand what problems can arise when using pip and the requirements.txt file, first let’s talk through the typical way to use this duo in your projects.

First, we will create a Virtual Environment or a Docker container, then you pip install the packages you need. Let’s say that you use keras and pandas by running the following terminal commands pip3 install pandas, pip3 install keras.

When you do a pip3 list command, you will not only see the list of installed packages, but also all their dependencies:

Package             Version
------------------- -------
h5py                2.10.0
Keras               2.3.1 
Keras-Applications  1.0.8 
Keras-Preprocessing 1.1.0 
numpy               1.18.1 
pandas              1.0.1 
pip                 19.0.3 
python-dateutil     2.8.1 
pytz                2019.3 
PyYAML              5.3 
scipy               1.4.1 
setuptools          40.8.0 
six                 1.14.0

After having worked (and coded) for many productive hours, we want to send your project to production, so we might do a pip freeze. Pip freeze takes an image of the current situation, a snapshot of the time. We can write all packages in the requirements.txt file by the pip3 freeze > requirements.txt terminal command. At this point, our packages are pinned (the exact version is indicated in the requirements.txt file.)

Now our requirements.txt file contains the packages installed and all their dependencies. If we want to send our project to production, the next developer can simply use the pip install -r requirements.txt terminal command to install all packages contained in the .txt file. This seems easy enough and at the first glance it seems to work fine.

But now let’s imagine that one of the dependencies of pandas, let’s say pytz, got updated. Let’s imagine that the next version contains some bug fixes. Ideally, we would like to get the newest version of this sub-dependency (assuming that the newest version is still compatible with the pinned version of pandas), but at the same time we want to keep pandas at the pinned version.

Now let’s push it even further, and imagine the following two scenarios. First imagine that the old version of pytz leads to a problem when we use pandas. We search for the bug, and we don’t understand why our code worked fine in your environment 2 weeks ago and doesn’t work now. Finally, after some frustration, we find the source of the problem, one of your sub-dependencies got updated and we should correct our requirements.txt file to contain the new version. This is worrying because we need to manually change something that ideally we should not.

Okay, we might say then to pin only the version of keras and pandas, clearing all sub-dependencies from the requirements.txt file. But then, let’s imagine the second scenario, in which pandas and pytz already have a new version, and that the pinned version of pandas is incompatible with the new version of pytz. The problem is that pipdoes not do compatibility checks. Manually adding/correcting or clearing packages from the requirements.txt file does not seem to be the solution. Ideally, we would like to do the following things simultaneously:

1) to pin (fix) the version of the packages we use 2) not to pin the version of the dependency packages to update them automatically when a new version is out 3) update the dependency packages only when they are compatible with the pinned version of packages we pinned.

New packaging tools aim to do these points. Let’s see then pipenv, pip-tools and poetry.

1) pipenv

Pipenv is a package that uses virtualenv and pip together. What it means is that it combines the creation of a virtual environment and the package installing tool together. To install pipenv (on mac), run the command: In [ ]:

brew install pipenv

Now that you have pipenv, by running the pipenv shell command, you create a virtual environment for your project (if there was none before). Now that you are in your virtual environment (you are in it because you are still in the pipenv shell), you can install your packages (and pin their version) by the following commands: pipenv install keras==2.3.1 , pipenv install pandas==1.0.1.

Now we can see that pipenv first creates a Pipfile and a Pipfile.lock, then each installed package will be added to the Pipfile. The PipFile is similar to the requirements.txt file, however, no dependencies are included in this file, only the packages we actually install. It has a slightly different structure though, instead of listing all packages as the requirements.txt file does, it is written in TOML format. Sections, such as [packages], [dev-packages], [requires] separate different informations. Suppose that we need a package only for development (for instance pytest). This package is not needed for deployment, then we can run pipenv install pytest --dev, and this command will add the pytest package to the [dev-packages] section of the Pipfile.

[requires]
python_version = "3.7"

[packages]
keras = "==2.3.1" 
pandas = "==1.0.1"

This is cool as with the pip method, if we wanted to define the development and deployment packages, we would have done requirements-test.txt and requirements-dev.txt files separately, and we would install either all packages or only those included in the requirements-test.txt file, depending on what we need. 

Okay so now we want to send our project to production. By using the pipenv lock command, we do essentially the same thing as we have done with the pip freeze command, we create a snapshot of time of our packages and sub-dependencies, and write them in the Pipfile.lock file. This file should never be edited manually.

The Pipfile.lock is written in a JSON format, it contains all packages installed (so those in the Pipfile and their dependency packages), with their version and unique hashes. (These latters are included to make sure that we install the same exact package during deployment as we did during the development.) Our Pipfile.lock looks something like this:

{ "_meta": 
  { "hash": 
    { "sha256": "7a1f626d1e55bd1f569fec8b6422124a3a15466910039ee80adc7591e7664fc5" }, 
     "pipfile-spec": 6, 
     "requires": 
         { "python_version": "3.7" }, 
     "sources": 
         [ { "name": "pypi", 
             "url": "https://pypi.org/simple", 
             "verify_ssl": true } ] }, 
     "default": 
           { "h5py": { "hashes": [ "sha256:063947eaed5f271679ed4ffa36bb96f57bc14f44dd4336a827d9a02702e6ce6b", .... ], 
     "version": "==2.10.0" ..... } } }

The next developer can create the last successfully used, locked environment and packages by the pipenv install --ignore-pipfile command. The option –ignore-pipfile tells pipenv to disregard the Pipfile and use the Pipfile.lock instead.

We can also use the pipenv install or the pipenv install --dev commands (the second installs all packages – even in the [dev-packages] section).

So how does pipenv differ from the duo of pip and requirements.txt? First, the Pipfile only contains the core packages we installed (and not their dependency packages), so when we run the pipenv install command, we install our core packages and their dependency packages (newest, compatible version). When we do a pipenv lock command, the newest, compatible versions of the dependency packages will be included in the Pipfile.lock with their pinned version. I say “compatible” version, as pipenv provides a dependency resolution. If checks whether the newest versions of the core package dependencies are compatible with your core packages, and if not, it will not include them.

Finally, using pipenv gives us the flexibility to do a deterministic rebuild, by installing all packages with the Pipfile.lock (pipenv install --ignore-pipfile command) instead of being based on the Pipfile.

pipenv also provides some extra features, we can visualise for instance our core and dependency packages by using the pipenv graph command:

Keras==2.3.1
- h5py [required: Any, installed: 2.10.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- keras-applications [required: Any, installed: 2.10.0]
- h5py [required: Any, installed: 2.10.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- numpy [required: >=1.7, installed: 1.18.1]
- keras-processing [required: >=1.0.5, installed: 1.1.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- numpy [required: >=1.9.1, installed: 1.18.1]
- pyyaml [required: Any, installed: 5.3]
- scipy [required: >=0.14, installed: 1.4.1]
- numpy [required: >=1.13.3, installed: 1.18.1]
- six [required: >=1.9.0, installed: 1.14.0]
pandas==1.0.1
- numpy [required: >=1.13.3, installed: 1.18.1]
- python-dateutil [required: >=2.6.1, installed: 2.8.1]
- six [required: >=1.5, installed: 1.14.0]
- pytz [required: >=2017.2, installed: 2019.3]

We can see that pipenv control for our dependency packages, as well as their dependencies. We can also see this graph in the inverse mode, use the pipenv graph --reverse command instead.

Moreover, we can call pipenv commands from a terminal without being in the pipenv shell (and so virtual environment) by specifying pipenv run before a command.

Lastly, we can also fix the version of Python used in the virtual environment by inserting the following into the Pipfile: 

[requires]
python_version = "3.7"

Then the virtual environment is limited to Python 3.7, and pipenv will respect that.

Now all this sounds great, but I personally had problems when using pipenv. It simply failed to import some packages although it seemed to work fine before, and altogether I didn’t have the feeling that the package is stable and bug-free. It certainly didn’t help that I did not see any activity on the package’s GitHub for some time. So I looked elsewhere, and I found.. pip-tools

2) pip-tools

pip-tools is the oldest project from the the three, and one that I actually use and the one I had the least problem with. Its functioning is based on two commands: pip-compile and  pip-sync. It does not create a virtual environment as pipenv does, so we need to install it to each virtual environment or docker container

pip install pip-tools 

Then we create a file containing the packages we need. By convention, this file is called  requirements.in We can specify version intervals or exact versions, here I specify the intervals so that finding compatible dependencies is easier. (Of course it does not make a difference with the two example packages.) So we create requirements.in that only contains the packages we want to import (but not their dependencies): 

$ cat requirements.in
pandas>1.0.0
keras>2.3

Now we need to use the pip-compile command. This will check all packages with the versions that are included in the requirements.in, choose compatible versions and include them in the requirements.txt file. 

$ pip-compile requirements.in
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile requirements.in
#
h5py==2.10.0 # via keras, keras-applications
keras-applications==1.0.8 # via keras
keras-preprocessing==1.1.0 # via keras
keras==2.3.1 # via -r requirements.in (line 2)
numpy==1.18.2 # via h5py, keras, keras-applications, keras-preprocessing, pandas, scipy
pandas==1.0.3 # via -r requirements.in (line 1)
python-dateutil==2.8.1 # via pandas
pytz==2019.3 # via pandas
pyyaml==5.3.1 # via keras
scipy==1.4.1 # via keras
six==1.14.0 # via h5py, keras, keras-preprocessing, python-dateutil

Now there a requirements.txt file is also created. Next, we can use the pip-sync command to install the packages included in the requirements.txt file. This is similar to the command pip install -r requirements.txt, but it will also delete all packages that are not included in the requirements.txt.  Finally, as pip-compile considers an already existing requirements.txt, we can rerun this command without fearing that pip-tools will make undesired changes in the requirements.txt file. Finally, to upgrade packages, we can use the pip-compile --upgrade-package <package> command. If we run the pip-compile --upgrade, pip-tools upgrades all packages at once. 

Finally, if we ask pip-tools to install packages with incompatible versions, it will simply fail and give an error message that it did not find compatible versions to resolve the dependencies. 

Personally, I  prefer  pip-tools to pipenv. pip-tools seems quicker and more stable, while pipenv fails sometimes to import packages that it did import before. Finally, I didn’t’ see any updates on the  pipenv GitHub for about a year now. But there is another, younger package installing project:

2) poetry

Poetry is the newest of the tree packaging tools and similarly to pipenv, it creates also a virtual environment. It only needs one file to create the virtual environment and install the packages with a dependency resolution. By convention, this file is called pyproject.toml. This one file is supposed to replace the setup.py file, the requirements.txt file, etc. 

One example of this is: 

[tool.poetry] 
name = "test-poetry"
version = "0.1.0"

[tool.poetry.dependencies]
python = "^3.7"
pandas = ">=1.0"
keras = ">=2.3"

[tool.poetry.dev-dependencies]

Now we can import the packages with the poetry install command and add packages with the poetry add "<package> <version>" command. 

Conclusion 

 

  • pipenv gives an elegant solution to create a virtual environment and manage the package import with dependency resolutions in one step. However, it does not always find compatible packages. Sometimes it fails to import packages, and I think there are some unresolved compatibility issues with the package. Its GitHub was not updates for almost a year now although not all issues/ bugs are resolved. Finally, it is not part of the Python packaging tool recommendation now and seems to be slower than other solutions. 

 

 

  • pip-tools does not create a virtual environment, it only focuses on importing packages and checking compatible dependencies. However, it does this task quite well. I also find it easier to add pip-tools to existing projects than to integrate the other two packaging tools. It is a recommended packaging tool of Python. 

 

 

  • poetry is the newest among these three packages. It creates a virtual environment just as pipenv does and seems to resolves some issues that pipenv had. It is also recommended. 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: