This post compares ways we can import packages to Python. Our old friend, the pip
and the requirements.txt
duo works fine until it doesn’t, so new packaging tools address the dependency management of package import as well. This post compares pipenv
, pip-tools
and poetry
.
What problems can arise when using pip
and requirements.txt
?
To understand what problems can arise when using pip
and the requirements.txt
file, first let’s talk through the typical way to use this duo in your projects.
First, we will create a Virtual Environment or a Docker container, then you pip install the packages you need. Let’s say that you use keras and pandas by running the following terminal commands pip3 install pandas
, pip3 install keras
.
When you do a pip3 list
command, you will not only see the list of installed packages, but also all their dependencies:
Package Version ------------------- ------- h5py 2.10.0 Keras 2.3.1 Keras-Applications 1.0.8 Keras-Preprocessing 1.1.0 numpy 1.18.1 pandas 1.0.1 pip 19.0.3 python-dateutil 2.8.1 pytz 2019.3 PyYAML 5.3 scipy 1.4.1 setuptools 40.8.0 six 1.14.0
After having worked (and coded) for many productive hours, we want to send your project to production, so we might do a pip freeze. Pip freeze takes an image of the current situation, a snapshot of the time. We can write all packages in the requirements.txt
file by the pip3 freeze > requirements.txt
terminal command. At this point, our packages are pinned (the exact version is indicated in the requirements.txt
file.)
Now our requirements.txt
file contains the packages installed and all their dependencies. If we want to send our project to production, the next developer can simply use the pip install -r requirements.txt
terminal command to install all packages contained in the .txt file. This seems easy enough and at the first glance it seems to work fine.
But now let’s imagine that one of the dependencies of pandas, let’s say pytz
, got updated. Let’s imagine that the next version contains some bug fixes. Ideally, we would like to get the newest version of this sub-dependency (assuming that the newest version is still compatible with the pinned version of pandas
), but at the same time we want to keep pandas
at the pinned version.
Now let’s push it even further, and imagine the following two scenarios. First imagine that the old version of pytz
leads to a problem when we use pandas. We search for the bug, and we don’t understand why our code worked fine in your environment 2 weeks ago and doesn’t work now. Finally, after some frustration, we find the source of the problem, one of your sub-dependencies got updated and we should correct our requirements.txt
file to contain the new version. This is worrying because we need to manually change something that ideally we should not.
Okay, we might say then to pin only the version of keras
and pandas
, clearing all sub-dependencies from the requirements.txt
file. But then, let’s imagine the second scenario, in which pandas
and pytz
already have a new version, and that the pinned version of pandas
is incompatible with the new version of pytz
. The problem is that pip
does not do compatibility checks. Manually adding/correcting or clearing packages from the requirements.txt
file does not seem to be the solution. Ideally, we would like to do the following things simultaneously:
1) to pin (fix) the version of the packages we use 2) not to pin the version of the dependency packages to update them automatically when a new version is out 3) update the dependency packages only when they are compatible with the pinned version of packages we pinned.
New packaging tools aim to do these points. Let’s see then pipenv
, pip-tools
and poetry
.
1) pipenv
Pipenv is a package that uses virtualenv and pip together. What it means is that it combines the creation of a virtual environment and the package installing tool together. To install pipenv (on mac), run the command: In [ ]:
brew install pipenv
Now that you have pipenv, by running the pipenv shell
command, you create a virtual environment for your project (if there was none before). Now that you are in your virtual environment (you are in it because you are still in the pipenv shell), you can install your packages (and pin their version) by the following commands: pipenv install keras==2.3.1
, pipenv install pandas==1.0.1
.
Now we can see that pipenv
first creates a Pipfile
and a Pipfile.lock
, then each installed package will be added to the Pipfile
. The PipFile
is similar to the requirements.txt
file, however, no dependencies are included in this file, only the packages we actually install. It has a slightly different structure though, instead of listing all packages as the requirements.txt
file does, it is written in TOML format. Sections, such as [packages]
, [dev-packages]
, [requires]
separate different informations. Suppose that we need a package only for development (for instance pytest
). This package is not needed for deployment, then we can run pipenv install pytest --dev
, and this command will add the pytest
package to the [dev-packages]
section of the Pipfile
.
[requires] python_version = "3.7" [packages] keras = "==2.3.1" pandas = "==1.0.1"
This is cool as with the pip
method, if we wanted to define the development and deployment packages, we would have done requirements-test.txt
and requirements-dev.txt
files separately, and we would install either all packages or only those included in the requirements-test.txt
file, depending on what we need.
Okay so now we want to send our project to production. By using the pipenv lock
command, we do essentially the same thing as we have done with the pip freeze
command, we create a snapshot of time of our packages and sub-dependencies, and write them in the Pipfile.lock
file. This file should never be edited manually.
The Pipfile.lock
is written in a JSON format, it contains all packages installed (so those in the Pipfile
and their dependency packages), with their version and unique hashes. (These latters are included to make sure that we install the same exact package during deployment as we did during the development.) Our Pipfile.lock
looks something like this:
{ "_meta": { "hash": { "sha256": "7a1f626d1e55bd1f569fec8b6422124a3a15466910039ee80adc7591e7664fc5" }, "pipfile-spec": 6, "requires": { "python_version": "3.7" }, "sources": [ { "name": "pypi", "url": "https://pypi.org/simple", "verify_ssl": true } ] }, "default": { "h5py": { "hashes": [ "sha256:063947eaed5f271679ed4ffa36bb96f57bc14f44dd4336a827d9a02702e6ce6b", .... ], "version": "==2.10.0" ..... } } }
The next developer can create the last successfully used, locked environment and packages by the pipenv install --ignore-pipfile
command. The option –ignore-pipfile tells pipenv
to disregard the Pipfile
and use the Pipfile.lock
instead.
We can also use the pipenv install
or the pipenv install --dev
commands (the second installs all packages – even in the [dev-packages]
section).
So how does pipenv
differ from the duo of pip
and requirements.txt
? First, the Pipfile
only contains the core packages we installed (and not their dependency packages), so when we run the pipenv install
command, we install our core packages and their dependency packages (newest, compatible version). When we do a pipenv lock
command, the newest, compatible versions of the dependency packages will be included in the Pipfile.lock
with their pinned version. I say “compatible” version, as pipenv
provides a dependency resolution. If checks whether the newest versions of the core package dependencies are compatible with your core packages, and if not, it will not include them.
Finally, using pipenv
gives us the flexibility to do a deterministic rebuild, by installing all packages with the Pipfile.lock
(pipenv install --ignore-pipfile
command) instead of being based on the Pipfile
.
pipenv
also provides some extra features, we can visualise for instance our core and dependency packages by using the pipenv graph
command:
Keras==2.3.1
- h5py [required: Any, installed: 2.10.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- keras-applications [required: Any, installed: 2.10.0]
- h5py [required: Any, installed: 2.10.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- numpy [required: >=1.7, installed: 1.18.1]
- keras-processing [required: >=1.0.5, installed: 1.1.0]
- numpy [required: >=1.7, installed: 1.18.1]
- six [required: Any, installed: 1.14.0]
- numpy [required: >=1.9.1, installed: 1.18.1]
- pyyaml [required: Any, installed: 5.3]
- scipy [required: >=0.14, installed: 1.4.1]
- numpy [required: >=1.13.3, installed: 1.18.1]
- six [required: >=1.9.0, installed: 1.14.0]
pandas==1.0.1
- numpy [required: >=1.13.3, installed: 1.18.1]
- python-dateutil [required: >=2.6.1, installed: 2.8.1]
- six [required: >=1.5, installed: 1.14.0]
- pytz [required: >=2017.2, installed: 2019.3]
We can see that pipenv
control for our dependency packages, as well as their dependencies. We can also see this graph in the inverse mode, use the pipenv graph --reverse
command instead.
Moreover, we can call pipenv
commands from a terminal without being in the pipenv shell (and so virtual environment) by specifying pipenv run
before a command.
Lastly, we can also fix the version of Python used in the virtual environment by inserting the following into the Pipfile:
[requires] python_version = "3.7"
Then the virtual environment is limited to Python 3.7, and pipenv will respect that.
Now all this sounds great, but I personally had problems when using pipenv
. It simply failed to import some packages although it seemed to work fine before, and altogether I didn’t have the feeling that the package is stable and bug-free. It certainly didn’t help that I did not see any activity on the package’s GitHub for some time. So I looked elsewhere, and I found.. pip-tools
2) pip-tools
pip-tools
is the oldest project from the the three, and one that I actually use and the one I had the least problem with. Its functioning is based on two commands: pip-compile
and pip-sync
. It does not create a virtual environment as pipenv
does, so we need to install it to each virtual environment or docker container:
pip install pip-tools
Then we create a file containing the packages we need. By convention, this file is called requirements.in
We can specify version intervals or exact versions, here I specify the intervals so that finding compatible dependencies is easier. (Of course it does not make a difference with the two example packages.) So we create requirements.in
that only contains the packages we want to import (but not their dependencies):
$ cat requirements.in
pandas>1.0.0
keras>2.3
Now we need to use the pip-compile
command. This will check all packages with the versions that are included in the requirements.in
, choose compatible versions and include them in the requirements.txt
file.
$ pip-compile requirements.in
#
# This file is autogenerated by pip-compile
# To update, run:
#
# pip-compile requirements.in
#
h5py==2.10.0 # via keras, keras-applications
keras-applications==1.0.8 # via keras
keras-preprocessing==1.1.0 # via keras
keras==2.3.1 # via -r requirements.in (line 2)
numpy==1.18.2 # via h5py, keras, keras-applications, keras-preprocessing, pandas, scipy
pandas==1.0.3 # via -r requirements.in (line 1)
python-dateutil==2.8.1 # via pandas
pytz==2019.3 # via pandas
pyyaml==5.3.1 # via keras
scipy==1.4.1 # via keras
six==1.14.0 # via h5py, keras, keras-preprocessing, python-dateutil
Now there a requirements.txt
file is also created. Next, we can use the pip-sync
command to install the packages included in the requirements.txt
file. This is similar to the command pip install -r requirements.txt
, but it will also delete all packages that are not included in the requirements.txt
. Finally, as pip-compile
considers an already existing requirements.txt
, we can rerun this command without fearing that pip-tools
will make undesired changes in the requirements.txt
file. Finally, to upgrade packages, we can use the pip-compile --upgrade-package <package>
command. If we run the pip-compile --upgrade
, pip-tools
upgrades all packages at once.
Finally, if we ask pip-tools
to install packages with incompatible versions, it will simply fail and give an error message that it did not find compatible versions to resolve the dependencies.
Personally, I prefer pip-tools
to pipenv
. pip-tools
seems quicker and more stable, while pipenv
fails sometimes to import packages that it did import before. Finally, I didn’t’ see any updates on the pipenv
GitHub for about a year now. But there is another, younger package installing project:
2) poetry
Poetry is the newest of the tree packaging tools and similarly to pipenv
, it creates also a virtual environment. It only needs one file to create the virtual environment and install the packages with a dependency resolution. By convention, this file is called pyproject.toml
. This one file is supposed to replace the setup.py
file, the requirements.txt
file, etc.
One example of this is:
[tool.poetry]
name = "test-poetry"
version = "0.1.0"
[tool.poetry.dependencies]
python = "^3.7"
pandas = ">=1.0"
keras = ">=2.3"
[tool.poetry.dev-dependencies]
Now we can import the packages with the poetry install
command and add packages with the poetry add "<package> <version>"
command.
Conclusion
-
pipenv
gives an elegant solution to create a virtual environment and manage the package import with dependency resolutions in one step. However, it does not always find compatible packages. Sometimes it fails to import packages, and I think there are some unresolved compatibility issues with the package. Its GitHub was not updates for almost a year now although not all issues/ bugs are resolved. Finally, it is not part of the Python packaging tool recommendation now and seems to be slower than other solutions.
-
pip-tools
does not create a virtual environment, it only focuses on importing packages and checking compatible dependencies. However, it does this task quite well. I also find it easier to addpip-tools
to existing projects than to integrate the other two packaging tools. It is a recommended packaging tool of Python.
-
poetry
is the newest among these three packages. It creates a virtual environment just aspipenv
does and seems to resolves some issues thatpipenv
had. It is also recommended.
Leave a Reply