Aws Lambda Primer With Ruby using the RedShift, Secrets Manager and S3

The last time I was writing about AWS Lambda was more than four years ago and that story involved some batch processing with a very rough cost estimate of custom code processing vs the AWS Lambda.

This time I am writing about my AWS Lambda experience using the Ruby runtime and hopefully sharing not so obvious a thing or two.

1. Basic scaffolding

Writing AWS Lambda functions requires you to define a static handler method. I decided to have lambda_handler.rb file in the root folder and everything else would go inside the lib folder. Don’t forget to name your lambda handler in the AWS console as lambda_handler.LambdaHandler.call.

# frozen_string_literal: true

require "honeybadger"
require "pg"

require_relative "utils"

Honeybadger.context \
  tags: "lambda, #{Utils.lambda_name}"

class LambdaHandler
  class << self
    def call(event:, context:)
      ...
    rescue StandardError => e
      Honeybadger.notify \
        e,
        sync: true,
        context: Utils.lambda_to_hb_context(context)
      raise
    end

Already in this example, there’s a small lesson to learn about the error reporting using the Honeybadger gem. Honeybadger is smart enough to realize when it’s been used from Rails or Sinatra. When used from those environments, it won’t do anything special about executing its async notifications. In all other cases (like being used from the Ruby CLI app) it will install so-called at_exit hook to guarantee that all its async code is being waited upon until it properly finishes. This, however, doesn’t work with AWS Lambda. I quickly realized that regular Honeybadger notifications are executed asynchronously and were not doing so properly within Lambda. Luckily, sync: true comes to rescue.

Pro tip: Use Honeybadger.notify(..., sync: true, ...) when sending notifications from AWS Lambda.

2. Connecting to a RedShift

RedShift is based on PostgreSQL 8.0.2 and in order to access it from Ruby, you should probably head straight for the pg gem. The first problem I bumped into is that pg gem’s native extension didn’t want to compile. My build environment is using lambci/lambda:build-ruby2.5 docker images from the lambci project, so fixing that was rather easy:

# my package build Makefile
docker run -v $$PWD:/var/task -it --rm lambci/lambda:build-ruby2.5 \
  /bin/bash -c 'yum -q -y install postgresql-devel && ...'

However, once I loaded zipped package to AWS, and ran a test I got a rather funny looking error:

libpq.so.5: cannot open shared object file: No such file or directory - /var/task/vendor/bundle/ruby/2.5.0/extensions/x86_64-linux/2.5.0-static/pg-1.1.4/pg_ext.so

It seems that our pg native extension requires yet another shared object library i.e. libpq.so.5. In order to fetch it, I went into that docker container:

docker run -v `pwd`:/var/task -it --rm lambci/lambda:build-ruby2.5 /bin/bash

From there I installed the required PostgreSQL dev libraries, built the required dependencies and checked the extension’s dependencies:

yum -y install postgresql-devel
bundle install --without development test --path vendor/bundle
readelf -d vendor/bundle/ruby/2.5.0/extensions/x86_64-linux/2.5.0-static/pg-1.1.4/pg_ext.so
Dynamic section at offset 0x2e3f0 contains 31 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libpq.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libgmp.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libcrypt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Let’s find the location of that first dependency:

find / -name libpq.so.5
/usr/lib64/libpq.so.5

So then I had to figure out how to package that file into my Lambda and make sure path to it is added to LD_LIBRARY_PATH environment variable. Luckily, Amazon made that quite easy and there are multiple options for it. Let’s check first some env vars from that docker image:

echo $LD_LIBRARY_PATH
/var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
echo $PATH
/var/lang/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin

It seems there are a number of places where we can put our shared objects and binaries. The easiest options would be either putting libpq.so.5 in the Ruby project’s root folder or creating a lib folder in the same place and slip it into there. If you are a bit more ambitious, you will create a separate zip package and have a AWS Lambda Layer attached to your lambda function. Just make sure your zip file structure looks something like this:

# layer.zip
+-- lib
  +-- libpg.so.5

Pro tip: package libpq.so.5 with your lambda code or have it in a layer.

The last part of the RedShift puzzle is putting your Lambda into the VPC to be able to access the database. Later on this move will turn out to be a bit of a problem, but for now, all it takes is making sure Lambda function is in the same VPC as RedShift, with all the subnets and security groups to allow access to port 5439 and, of course, lambda’s execution role. Here’s how the JSON policy should look like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DeleteNetworkInterface",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs"
            ],
            "Resource": "*"
        }
    ]
}

3. Accessing the Secrets Manager and S3

You will probably want to store your RedShift credentials to some encrypted storage compared to keeping it hardcoded inside your lambda code (GitHub) or keeping it inside some environment variables (also GitHub via e.g. terraforming script). A good place to keep those RedShift credentials is AWS Secrets Manager, so let’s see how that code might look like:

# frozen_string_literal: true

require "yaml"
require "aws-sdk-secretsmanager"

class SecretsManager
  class << self
    attr_reader :db_config

    def init_secrets
      honeybadger_id = ENV.fetch("HONEYBADGER_ID")
      hb_secret =
        client.get_secret_value(secret_id: honeybadger_id)
      hb_config = JSON.parse(hb_secret.secret_string)
      ENV["HONEYBADGER_API_KEY"] = hb_config["api_key"]

      redshift_id = ENV.fetch("REDSHIFT_CREDENTIALS_SECRET")
      redshift_secret =
        client.get_secret_value(secret_id: radium_config_redshift_id)
      @db_config =
        JSON.parse(redshift_secret.secret_string, symbolize_names: true)
    end
    
    def client
      @client ||=
        Aws::SecretsManager::Client.new \
          region: ENV.fetch("AWS_REGION", "us-east-1")
    end

However, I quickly realized that once running the code from above, my lambda started to timeout.

Long story short (in reality it was a very long and painful debugging session), once I have decided to put my lambda within the VPC, I have lost access to the internet. Since AWS Secrets Manager is accessible via the internet, and my VPC didn’t have a NAT Gateway associated with it, I was in trouble.

Luckily there is a workaround for this called interface Endpoint and can be found under the VPC settings. Check this article for further details.

Once I got AWS Secrets Manager code running, I ran into the same issue when accessing the S3. S3 service is also not accessible from within the VPC unless you either have a NAT Gateway or you have defined another Endpoint, but this time of a gateway type.

Pro tip: accessing Secrets Manager requires a NAT Gateway (using public internet) or interface Endpoint (preferable) once you put lambda inside the VPC

Check Lambda VPC docs for some more sensible bits of advice on the subject.

4. Reusing the database connection

Reuse that single database connection between different lambda handler invocations. Lambda Ruby runtime calls your handler in a loop synchronously, never in parallel. So there’s no need for any connection pooling, just make sure to reuse that one connection properly. Here’s an example of how to do it:

# frozen_string_literal: true

require "pg"
require "retryable"

class DatabaseHelper
  ...
  def run
    Retryable.retryable(
      tries: 3, on: PG::ConnectionBad
    ) do |retries, _|
      puts "db connection error, retry #{retries}" if retries.positive?
      db_conn = self.class.connection(force: retries.positive?)
      db_conn.transaction do |conn|
        do_crazy_stuff(conn)
      end
    end
  rescue PG::Error => e
    put "failed to do crazy stuff: #{e.class}, #{e.message}"
  end
  ...
  class << self
    def connection(force: false)
      @connection =
        if force
          connect_to_db
        else
          @connection || connect_to_db
        end
    end

    def connect_to_db
      PG.connect(SecretsManager.db_config)
    end
  end

end

However, if there are more events to handle than the single lambda worker is able to process, the lambda scheduler will spawn more lambda instances and these will work in parallel. Such behavior is regulated by the lambda concurrency number and is preferred to set it up to a number of max connections you might have on your database (or any other shared resource you might be accessing in a similar way).

Pro tip: on a single box lambda runtime executes your handler code in a loop, synchronously.

Check AWS docs on the lambda concurrency or lambci’s GitHub repo for even more details.

While at it, you might take a quick look at the my lambcli-ruby repo to get the idea of how that lambda runtime loop looks like. I copied /var/runtime folder off of lambci/lambda:build-ruby2.5 docker image for an easy inspection.

5. Zip package liposuction

The suggested way to make your zip package containing lambda code smaller is to move all your dependencies, shared object libraries and binaries into a separate layer.

However, I found there’s even a simpler way to trim down your zip archive by carefully inspecting what ends up inside the vendor/bundle folder: 1. exclude all your specs and native extension compiling artifacts
2. remove all extra instances of pg_ext.so file (it’s 1 MB in size and can be found in three different places - two are redundant).

Here’s how my bash packaging command looks like:

zip -rq -9 "$(BASE)/$(PROJECT_NAME).zip" . \
  -x "spec/*" \
     "**/spec/*" \
     "vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg_ext.so" \
     "vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/ext/*"

Pro tip: know what goes into your lambda package!

Making your code and package smaller makes deployment faster and code editing/testing inside the Cloud9 editor much more enjoyable.

That’s that for now, until the next time!


1495 Words

2019-05-14 21:55 +0200

comments powered by Disqus