Abstract

Motive

aws Lambda, and thus by assumption Azure Function, and GCP Function environment are not easily, if at all, reproducible during local development.

Recently for a personal python project, after setting up a new development environment the packaged lambda layer now experiences new issues related to segmentation fault. The code is locally runnable and partially runnable in AWS. However, function calls related to boto3 and s3fs (suspected) causes segmentation fault.

This lead to the search for deployment method that is still cheap (lambda have always free tier), whilst maintaining environment reproducibility.

Exploration

Deployment via container

Caveat

  • Since container come at the extra cost of storing system information (Linux setups). They would always be larger to deploy. The question is by how much, and whether we can still stay in the free tier?

Size Comparison

Limit onaws Platform

lambda

Python application packaging is known to be bloated compare to other languages. For the personal project the layer easily reached 70 MB zipped, 184 MB unzipped. So this is already close to the limit.

NOTE

At this time I also have kept boto3 which should be redundant as lambda should have boto3 by default. But I included it to debug segmentation fault, Below is the respective top package size obtained using du -hs * | sort -hr inside the packages installation folder. Pip Option --compile made no difference.

SizeLib
65Mpandas
41Mnumpy
26Mbotocore
25Mnumpy.libs
12Mlxml
7.5Mfastparquet
6.4Maiohttp
5.3Mcramjam
2.8Mtzdata
2.8Mpytz
2.7Myaml
1.2Mbs4
1.5Mfsspec
1.1Mhtml5lib
1.1Myarl

TIP

Also, lambda has no GPU options (not that one should run cost intensive application on lambda anyway). So the concern of installing 2 GB GB of Pytorch or tensorflow is an null-issue.

You can add up to five layers to a Lambda function. The total unzipped size of the function and all layers cannot exceed the unzipped deployment package size quota of 250 MB.

docker container image

According to AWS Official Documentation

Lambda supports a maximum uncompressed image size of 10 GB, including all layers.

To make the image compatible with Lambda, you must include a runtime interface client for your language in the image. (non aws images) pip install awslambdaric

Note that smaller the image, the more likely one runs into compatibility issues. The below table will serve as a comparison of artifact size for python 3.12 on my personal project.

base imagebase sizepip install sizenotes
public.ecr.aws/lambda/python:3.12532 MB258 + 342 MB (prod + dev)default aws python image
python:3.121.01 GB?debian based
python:3.12-slim123 MB?does not contain a lot of defacto debian packages
python:3.12-alpine48 MB?alpine is the smallest with near zero additional toolings like git. in 2020 it was said to pip install from wheel 50x longer, but no longer the case with alpine specific wheels. forum source
ghcr.io/astral-sh/uv:python3.12-bookworm1.04 GB?
ghcr.io/astral-sh/uv:python3.12-bookworm-slim158 MB
ghcr.io/astral-sh/uv:python3.12-alpine83 MB

There are also flavourless container that essentially only have a binary boot up point, but that typically doesnt work well with python as packages would often have a lot of system make dependencies. (maybe it would work with uv? the uv distroless image is 34 MB, even less than python alpine. Need to read put your uv project inside a docker container. This is an example of how to achieve this.)

NOTE

when time permits lets also consider uv images for a dev workflow that does all the following:

uv images have tags for flavourless, debian and alpine.

more reading on lightweight python deployment

Container Image Registry Hosting

aws

ECR has a free tier of 50 GB for public image repo but not for private. Hence not suitable for enterprise usage. Also AWS Lambda can only pull from private repo, so the cost is unavoidable outside of 12 month free tier.

https://www.reddit.com/r/aws/comments/1358jqy/lambda_docker_with_public_ecr/

Plain storage cost of s3 (standard) vs ecr seems to be about 1:4 (without considering inter service transfer cost) at $0.1/GB for ECR.

For s3 (infrequent) vs ecr cost ratio would be 1:7.

However with request taken into account (constantly updating images) ecr seems better for development stage and extra large scale multi service deployment (cost of using image within lambda and fargate is free) and s3 seems better for backup and internal usage.

other options outside of cloud and lambda context

https://gist.github.com/JakubOboza/fbd6259f5b6321f17e8c3cdb1b095004