Posts on Nisdom. As opposed to Wisdom.

Move to fly.io

Sat, 14 Oct 2023 22:08:32 +0200

After more than 10 years of hosting my hugo-based blog on one of DigitalOcean’s machines, I decided to move to a different hosting environment. For a decade, I ran this site on a DigitalOcean Ubuntu droplet with tightly secured nginx. This droplet, beyond hosting a simple static website, served as my experimentation platform — running 24/7, connected to the internet with a decent connection, and costing me only $6 a month. Over the years, it was a place for this software engineer to experiment and play. I believe that every software engineer should have the skills to create a website, serve it, set up certificates, configure DNS, and email, at a bare minimum.

Finally, I chose fly.io, a platform that utilizes Firecracker virtualization. For those who might not be up-to-date on this, Firecracker is a virtual machine monitor (VMM) developed by Amazon Web Services (AWS). It was initially created to replace QEMU (or a derivative they’ve used) and power AWS Lambda and Firegate products more efficiently. Later, they open-sourced it. Firecracker uses the Linux Kernel-based Virtual Machine (KVM) to create and manage microVMs. With this tool, you can host your services using resource-efficient containers, like Docker.

One of the very cool aspects of fly.io is its ability to put apps to sleep when you’re not using them. Even more impressively, it can spin them up in under a second, or even a few hundred milliseconds, even with hobby instances having just 1 shared CPU and 256 MiB of RAM!

Instead of spinning up nginx inside my container, I opted for Caddy as it seemed like a simpler solution. Notably, Caddy has no libc dependency, and configuring it is much simpler compared to nginx. To illustrate, here’s a Caddy configuration file for this site:

{
	auto_https off
}

http://nisdom.com {
	root * /usr/share/caddy
	file_server
}

I’ve turned off HTTPS in the Caddy configuration since fly.io handles that for you. Even without fly.io, Caddy is capable of automatic TLS certificate renewals, eliminating the need for manual cronjobs to generate Let’s Encrypt certificates.

The Dockerfile is equally simple (the public folder is where Hugo generates your static website, and the Caddyfile consists of the seven lines above):

FROM caddy:2.7.5

COPY ./public/ /usr/share/caddy/
COPY ./Caddyfile /etc/caddy/Caddyfile

Installing mysql2 Ruby gem on MacOS

Sun, 19 May 2019 19:19:51 +0200

The other day I was installing mysql2 gem on macOS for Ruby 2.6.2, something that was supposed to be less than a walk in the park. I knew I would most likely have some hiccups when compiling gem’s native extension, but that usually and rather unglamorously boils down to finding the correct MySQL dev libraries. However, there were unexpected twists and turns.

Installing the mysql2 gem

$ asdf shell ruby 2.6.2
$ gem install mysql2
Building native extensions. This could take a while...
ERROR:  Error installing mysql2:
	ERROR: Failed to build gem native extension.
...
mysql client is missing. You may need to 'brew install mysql' or 'port install mysql', and try again.
...

Ok, so gem’s build script is politely telling me that I need to first install MySQL using the Homebrew package manager. Since I don’t need the whole database server but only some development libraries (will be using MySQL from a Docker container), I tried installing the usual mysql-devel.

$ brew install mysql-devel
...
Error: No available formula with the name "mysql-devel"

After some googling, I figured the MySQL client library was available at this mysql.com page. Luckily, there’s already a ready-made homebrew formula mysql-connector-c.

$ brew install mysql-connector-c
...
🍺 /usr/local/Cellar/mysql-connector-c/6.1.11: 79 files, 15.3MB
$ gem install mysql2
Building native extensions. This could take a while...
ERROR:  Error installing mysql2:
	ERROR: Failed to build gem native extension.
...
compiling statement.c
linking shared-object mysql2/mysql2.bundle
ld: library not found for -l-Wno-atomic-implicit-seq-cst
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [mysql2.bundle] Error 1

make failed, exit code 2

Dafuq?!

Linker trouble

Linker coldly reported library not found error so I had to take a bit longer look at the build log. Finally managed to found one rather interesting line more towards the beginning of it: Using mysql_config at /usr/local/bin/mysql_config.

After some reading how mysql2 gem builds its native extension and checking the content of mentioned mysql_config script, I suspected something might be wrong there - linker command “-l-Wno-atomic-implicit-seq-cst” just didn’t make any sense.

Broken mysql_config

A more detailed look at that config script revealed the culprit:

# /usr/local/bin/mysql_config
...
# Create options 
libs="-L$pkglibdir"
libs="$libs -l "
embedded_libs="-L$pkglibdir"
embedded_libs="$embedded_libs -l "
...

Lines with libs and embedded_libs contained errors - they were cut off after the -l parameter. I quickly tracked the error to the https://dev.mysql.com/downloads/connector/c repository where the problematic file was present (homebrew’s formula pulls package from there). After some tinkering, I managed to produce the correct version of the mysql_config file.:

# Create options 
libs="-L$pkglibdir"
libs="$libs -lmysqlclient -lcrypto -lssl"
embedded_libs="-L$pkglibdir"
embedded_libs="$embedded_libs -lmysqlclient -lcrypto -lssl"

Almost done

Trying to install the gem now, ends up with yet another error:

$ gem install mysql2
...
ld: library not found for -lcrypto

This one is easy. We just need to give instructions to the linker where to find the necessary libcrypto (part of the openssl).

$ brew install openssl # if you haven't done that already
$ gem install mysql2 -- --with-ldflags=-L/usr/local/opt/openssl/lib

Alternative solution (TL;DR)

Another solution that doesn’t require fixing the mysql_config is providing all folders to the native extension’s build command. mysql2gem will then use those settings and not the ones provided by the mysql_config script:

$ gem install mysql2 -- \
  --with-ldflags=-L/usr/local/opt/openssl/lib \
                 -L/usr/local/opt/mysql-connector-c/lib -lmysqlclient -lcrypto -lssl \
  --with-cppflags=-I/usr/local/opt/mysql-connector-c/include

Aws Lambda Primer With Ruby using the RedShift, Secrets Manager and S3

Tue, 14 May 2019 21:55:22 +0200

The last time I was writing about AWS Lambda was more than four years ago and that story involved some batch processing with a very rough cost estimate of custom code processing vs the AWS Lambda.

This time I am writing about my AWS Lambda experience using the Ruby runtime and hopefully sharing not so obvious a thing or two.

1. Basic scaffolding

Writing AWS Lambda functions requires you to define a static handler method. I decided to have lambda_handler.rb file in the root folder and everything else would go inside the lib folder. Don’t forget to name your lambda handler in the AWS console as lambda_handler.LambdaHandler.call.

# frozen_string_literal: true

require "honeybadger"
require "pg"

require_relative "utils"

Honeybadger.context \
  tags: "lambda, #{Utils.lambda_name}"

class LambdaHandler
  class << self
    def call(event:, context:)
      ...
    rescue StandardError => e
      Honeybadger.notify \
        e,
        sync: true,
        context: Utils.lambda_to_hb_context(context)
      raise
    end

Already in this example, there’s a small lesson to learn about the error reporting using the Honeybadger gem. Honeybadger is smart enough to realize when it’s been used from Rails or Sinatra. When used from those environments, it won’t do anything special about executing its async notifications. In all other cases (like being used from the Ruby CLI app) it will install so-called at_exit hook to guarantee that all its async code is being waited upon until it properly finishes. This, however, doesn’t work with AWS Lambda. I quickly realized that regular Honeybadger notifications are executed asynchronously and were not doing so properly within Lambda. Luckily, sync: true comes to rescue.

Pro tip: Use Honeybadger.notify(..., sync: true, ...) when sending notifications from AWS Lambda.

2. Connecting to a RedShift

RedShift is based on PostgreSQL 8.0.2 and in order to access it from Ruby, you should probably head straight for the pg gem. The first problem I bumped into is that pg gem’s native extension didn’t want to compile. My build environment is using lambci/lambda:build-ruby2.5 docker images from the lambci project, so fixing that was rather easy:

# my package build Makefile
docker run -v $$PWD:/var/task -it --rm lambci/lambda:build-ruby2.5 \
  /bin/bash -c 'yum -q -y install postgresql-devel && ...'

However, once I loaded zipped package to AWS, and ran a test I got a rather funny looking error:

libpq.so.5: cannot open shared object file: No such file or directory - /var/task/vendor/bundle/ruby/2.5.0/extensions/x86_64-linux/2.5.0-static/pg-1.1.4/pg_ext.so

It seems that our pg native extension requires yet another shared object library i.e. libpq.so.5. In order to fetch it, I went into that docker container:

docker run -v `pwd`:/var/task -it --rm lambci/lambda:build-ruby2.5 /bin/bash

From there I installed the required PostgreSQL dev libraries, built the required dependencies and checked the extension’s dependencies:

yum -y install postgresql-devel
bundle install --without development test --path vendor/bundle
readelf -d vendor/bundle/ruby/2.5.0/extensions/x86_64-linux/2.5.0-static/pg-1.1.4/pg_ext.so
Dynamic section at offset 0x2e3f0 contains 31 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libpq.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libgmp.so.10]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libcrypt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Let’s find the location of that first dependency:

find / -name libpq.so.5
/usr/lib64/libpq.so.5

So then I had to figure out how to package that file into my Lambda and make sure path to it is added to LD_LIBRARY_PATH environment variable. Luckily, Amazon made that quite easy and there are multiple options for it. Let’s check first some env vars from that docker image:

echo $LD_LIBRARY_PATH
/var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
echo $PATH
/var/lang/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin

It seems there are a number of places where we can put our shared objects and binaries. The easiest options would be either putting libpq.so.5 in the Ruby project’s root folder or creating a lib folder in the same place and slip it into there. If you are a bit more ambitious, you will create a separate zip package and have a AWS Lambda Layer attached to your lambda function. Just make sure your zip file structure looks something like this:

# layer.zip
+-- lib
  +-- libpg.so.5

Pro tip: package libpq.so.5 with your lambda code or have it in a layer.

The last part of the RedShift puzzle is putting your Lambda into the VPC to be able to access the database. Later on this move will turn out to be a bit of a problem, but for now, all it takes is making sure Lambda function is in the same VPC as RedShift, with all the subnets and security groups to allow access to port 5439 and, of course, lambda’s execution role. Here’s how the JSON policy should look like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DeleteNetworkInterface",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs"
            ],
            "Resource": "*"
        }
    ]
}

3. Accessing the Secrets Manager and S3

You will probably want to store your RedShift credentials to some encrypted storage compared to keeping it hardcoded inside your lambda code (GitHub) or keeping it inside some environment variables (also GitHub via e.g. terraforming script). A good place to keep those RedShift credentials is AWS Secrets Manager, so let’s see how that code might look like:

# frozen_string_literal: true

require "yaml"
require "aws-sdk-secretsmanager"

class SecretsManager
  class << self
    attr_reader :db_config

    def init_secrets
      honeybadger_id = ENV.fetch("HONEYBADGER_ID")
      hb_secret =
        client.get_secret_value(secret_id: honeybadger_id)
      hb_config = JSON.parse(hb_secret.secret_string)
      ENV["HONEYBADGER_API_KEY"] = hb_config["api_key"]

      redshift_id = ENV.fetch("REDSHIFT_CREDENTIALS_SECRET")
      redshift_secret =
        client.get_secret_value(secret_id: radium_config_redshift_id)
      @db_config =
        JSON.parse(redshift_secret.secret_string, symbolize_names: true)
    end
    
    def client
      @client ||=
        Aws::SecretsManager::Client.new \
          region: ENV.fetch("AWS_REGION", "us-east-1")
    end

However, I quickly realized that once running the code from above, my lambda started to timeout.

Long story short (in reality it was a very long and painful debugging session), once I have decided to put my lambda within the VPC, I have lost access to the internet. Since AWS Secrets Manager is accessible via the internet, and my VPC didn’t have a NAT Gateway associated with it, I was in trouble.

Luckily there is a workaround for this called interface Endpoint and can be found under the VPC settings. Check this article for further details.

Once I got AWS Secrets Manager code running, I ran into the same issue when accessing the S3. S3 service is also not accessible from within the VPC unless you either have a NAT Gateway or you have defined another Endpoint, but this time of a gateway type.

Pro tip: accessing Secrets Manager requires a NAT Gateway (using public internet) or interface Endpoint (preferable) once you put lambda inside the VPC

Check Lambda VPC docs for some more sensible bits of advice on the subject.

4. Reusing the database connection

Reuse that single database connection between different lambda handler invocations. Lambda Ruby runtime calls your handler in a loop synchronously, never in parallel. So there’s no need for any connection pooling, just make sure to reuse that one connection properly. Here’s an example of how to do it:

# frozen_string_literal: true

require "pg"
require "retryable"

class DatabaseHelper
  ...
  def run
    Retryable.retryable(
      tries: 3, on: PG::ConnectionBad
    ) do |retries, _|
      puts "db connection error, retry #{retries}" if retries.positive?
      db_conn = self.class.connection(force: retries.positive?)
      db_conn.transaction do |conn|
        do_crazy_stuff(conn)
      end
    end
  rescue PG::Error => e
    put "failed to do crazy stuff: #{e.class}, #{e.message}"
  end
  ...
  class << self
    def connection(force: false)
      @connection =
        if force
          connect_to_db
        else
          @connection || connect_to_db
        end
    end

    def connect_to_db
      PG.connect(SecretsManager.db_config)
    end
  end

end

However, if there are more events to handle than the single lambda worker is able to process, the lambda scheduler will spawn more lambda instances and these will work in parallel. Such behavior is regulated by the lambda concurrency number and is preferred to set it up to a number of max connections you might have on your database (or any other shared resource you might be accessing in a similar way).

Pro tip: on a single box lambda runtime executes your handler code in a loop, synchronously.

Check AWS docs on the lambda concurrency or lambci’s GitHub repo for even more details.

While at it, you might take a quick look at the my lambcli-ruby repo to get the idea of how that lambda runtime loop looks like. I copied /var/runtime folder off of lambci/lambda:build-ruby2.5 docker image for an easy inspection.

5. Zip package liposuction

The suggested way to make your zip package containing lambda code smaller is to move all your dependencies, shared object libraries and binaries into a separate layer.

However, I found there’s even a simpler way to trim down your zip archive by carefully inspecting what ends up inside the vendor/bundle folder:

exclude all your specs and native extension compiling artifacts\
remove all extra instances of pg_ext.so file (it’s 1 MB in size and can be found in three different places - two are redundant).

Here’s how my bash packaging command looks like:

zip -rq -9 "$(BASE)/$(PROJECT_NAME).zip" . \
  -x "spec/*" \
     "**/spec/*" \
     "vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg_ext.so" \
     "vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/ext/*"

Pro tip: know what goes into your lambda package!

Making your code and package smaller makes deployment faster and code editing/testing inside the Cloud9 editor much more enjoyable.

That’s that for now, until the next time!

Minimalistic logging from Docker containers

Fri, 10 Apr 2015 12:53:00 +0200

Reading logs from Docker container can be done using docker logs container_id. This simply fetches logs present at the time of execution from container’s STDOUT and STDERR streams. If you want to, however, transform those logs, and send them to a central repository using e.g. logstash, there are a number of options to choose from. Here I’ll be describing the simplest case of writing logs to /dev/log socket.

Writers will write

Minimalistic scenario for collecting logs expects that your software, running inside the Docker container, writes to a syslog using the Unix domain socket /dev/log. With a Ruby app running inside the Docker container you can use Syslog::Logger class like this:

log = Syslog::Logger.new('my-awesome-app')
log.info('say something nice')

With a nodejs app and some help of ain package you might end up with:

var SysLogger = require('ain2')
var log = new SysLogger({tag: 'my-cool-app', path: '/dev/log'});
log.setTransport('unix_dgram');
log.info('say something sweet');

If you have a properly configured and running (r)syslog daemon, you will get to both accounts something like this in /var/log/messages:

Apr 10 18:31:25 ip-123-21-31-41 my-awesome-app[1101]: say something nice
Apr 10 18:31:25 ip-123-21-31-41 my-cool-app[1105]: say something sweet

The same effect can be achieved by using the logger tool. Both previously mentioned libraries, as well as the logger, use syslog(3) API call and write directly to /dev/log socket (if available).

And readers will read

Now comes the fun part. I promised to explain how this logging stuff and syslog plays with Docker. To be able to send logs to an IPC socket, somebody has to create it first. Let that somebody be an rsyslog daemon running on a Docker host. For this to work we need to have the following line in /etc/rsyslog.conf uncommented:

$ModLoad imuxsock

Last thing yet to do is to bind mount /dev/log socket on Docker run using something like:

docker run -v /dev/log:/dev/log my-image

One important thing to notice is that once we restart rsyslog daemon, for whatever reason, all apps running inside the Docker containers won’t be able to write to syslog anymore. Reason for that is dead simple - the socket to which all apps were bound is now gone and replaced by another one. If we want to write logs to that new socket, we should probably restart our apps.

A matter of logs

Wed, 04 Mar 2015 14:15:57 +0100

Logs are very important part of any serious software system. They provide invaluable insight in the current and past state of the system. Simply saving them to a disk or persisting them in any other crude way might probably deprive you from discovering anything interesting in it. The purpose of this article was to describe one such offline processing logs collection system I created years ago and to sketch possible real-time solutions using technologies available today.

Problem description and motivation

The story begins a couple of years ago when I was working on some server-side code that needed to process on a number of logs streaming from the desktop app. These logs contained various time-stamped events and, since at the time I was using Heroku to run web services, I had to be extra careful about the running costs. I also didn’t want to spend too much time on administration of some server software running on EC2 instances. Luckily, business requirements at the time didn’t call for the realtime solution, so in the end I decided to go with the offline one.

Simple(db) solution

Since there were a number of users running the desktop app concurrently, a relatively large number of events were generated, something close to 5K per second. From my experience, that number of concurrent calls on an HTTP endpoint wouldn’t even work on Heroku (for comparison, StackOverflow had around 3000 req/s in 2014). Since this was a desktop app, the decision was made to directly upload compressed batches of events (serialized as JSON data) to S3. When upload of a single batch was finished, app would still call Heroku web service to store a timestamp and a pointer to uploaded S3 file to a SimpleDb. Batching helped cutting down requests to less than 100 per second and writing metadata to SimpleDb was made out-of-band with a help of queue and some background workers. This solution was in the end still calling web service hosted on Heroku, but it was much a leaner one than it could have been.

At a time new-object-created event wasn’t available on S3 and even DynamoDB wasn’t there. SimpleDb was the only hosted columnar data store, with a very reasonable price-tag and bearable constraints for the offline processing purpose. If there was such S3 event, we could have skipped Heroku completely.

Next thing that needed to be done was offline processing of those events. For this purpose I created a daily cron job (running at night though) that was spawning some Ruby code. First it queried SimpleDb by grouping events by timestamp for the previous day. Then it pushed those events to the SQS instance served my the arbitrary large set of listeners. Listeners were pulling related blobs of data from S3, doing some transformations and finally updating various counters in MySQL.

Here’s a diagram of the whole scaffolding:

I hope I managed to clearly describe how the previous system was created. Now I am fast forwarding to see how could I build a similar, real-time system with the current technologies.

Fast forward today

Every such journey should starts with a little research. You don’t want to be a system architect stuck with a hammer and a saw; you better upgrade your toolbelt occasionally. After a relatively short research on the subject, I was amazed how enormous real-time logs/events processing area was and how many software products existed in this space. And by products I don’t mean the traditional ones like rsync or syslog based rsyslog or syslog-ng. I confess, it took me more than a day to grasp all the existing software products, what they actually represented and how they fitted inside their respective puzzles.

Producers

If I want to handle my logs in real-time, I obviously have to forget about uploading of compressed batches to S3 and all that offline processing.

I learned that I don’t deal with logs, but events, and the thing I would be doing is real-time events ingestion and processing. One useful acronym is ETL which stands for extract, transform and load, a typical thing which event consumers do.

We are dealing with roughly 5K events per second, so what comes to mind is that desktop apps could push events to some messaging i.e. queueing system. The usual suspects are RabbitMQ, 0MQ, Redis etc. They all could handle that much traffic, without even a blink, and if we needed more, we could always put some reverse proxy in front and happily continue. I would personally go with Redis since it’s very easy to configure and there’s a brilliant reverse proxy twemproxy (aka nutcracker) that supports Redis protocol, that is if I ever needed to create Redis cluster. Reasoning behind such messaging systems is to isolate message producers (in our case desktop apps) from message consumers (our Ruby scripts running on EC2 instances). I previously used highly available S3 service and SimpleDB service (unfortunately, not so highly available) to achieve a similar sort of isolation.

But I discovered there are even cooler toys out there called Apache Kafka and Amazon Kinesis. The main difference between Kafka/Kinesis and those more traditional messaging systems, according to their documentation, is that they are built from the ground up with a distribution in mind. This usually means seamless horizontal scaling with much higher loads.

It seems that Kinesis is less flexible than Kafka, but Kinesis has some other advantages that matter to me even more. If I wanted to have highly-available Kafka cluster, I would need to maintain a number of EC2 instances running Kafka and a separate Zookeeper instance used by Kafka for coordination among the nodes. With Kinesis I don’t need to worry about any of that cluster maintenance. It can even endlessly scale with almost no administrative burden. So I am perfectly happy to continue with the hassle-free Kinesis and write events directly from the desktop app to the Kinesis stream.

Consumers

Second part of the equation is consumption of those messages. I need to ingest each message, transform it a bit and then store it somewhere safe for later access. If data represents some counter, I might update its value in the database, and if it’s some text and I need to search on it later, I could store it to ElasticSearch. As I said, previously I used some Ruby script which execution was triggered once a day by a cron job. I could use that same Ruby script here as well, but this time it wouldn’t be started from a cron job, but from some other code listening to events arriving from the Kinesis stream. Amazon even provides a server implementation that works on top of Kinesis Client Library called MultiLangDaemon and that simplifies development of Kinesis record processors in languages other than Java. But I have my eyes set on something else.

As with messaging products, there are a number of choices in the logs/events collectors/processors arena. At least enough to spin my head once more - Apache Storm, Flume, logstash, fluentd, Amazon Lambda etc. Although these products differ in many ways, for the purpose of what I’m trying to achieve and in what they’re similar, I could use any of them. Apache Storm seems to be very powerful and quite a bit supported by the Amazon. On the other hand, there’s a brand new Amazon offering called Amazon Lambda, the holy grail of no-hassle solutions (which I always preferred, being a developer first person). Lambdas would even relieve me of having EC2 instances for events processing. So all I need to do is rewrite my Ruby transformations into JavaScript (Amazon uses Node.js behind the curtains) and unleash the magical power of Lambda. Sweet!

Cost estimate

It seems that I managed to put all those different pieces together and to at least imagine how would I turn my offline events processing into a real-time analytics solution. And all that using Amazon’s hosted solutions. The only remaining thing to do is to get the rough estimate of costs. I figured I would calculate only how much would I pay monthly for the use of Kinesis and Lambda. My original ETL code was transferring data to MySQL (RDS) and S3 in the “L” phase of ETL. This is something I would still be doing with Kinesis/Lambda solution. The only saving I would be able to achieve is the removal of $500/month worth of EC2 instances crunching the events, now replaced with Lambda.

Kinesis shard-hour cost

I already said that every second we produce around 5K events. Each such event contains around 1K in payload which makes 5 MB/s of data input. Since one shard in Kinesis stream has capacity of 1 MB/s, I would need 5 such shards. This is roughly $55.80 per month.

Kinesis PUT record cost

Next cost is related to PUT records. Number of events per month is 5000 * 60 * 60 * 24 * 31 i.e. 13,392,000,000. Million PUT records costs $0.028, so we end up with additional $375 per month. Since Kinesis messages can hold up to 50K in size, we might once again batch our events and write e.g. 10 events at once. This would make the number of PUT records 500 per second and we would still have system behaving as a real-time. So instead of adding $375, we would have extra $37.5 per month. Notice that the cost of shard-hour hasn’t changed with batching.

Lambda requests count cost

Since I decided to batch the events, I ended up with 1,339,200,000 lambda requests. First 1,000,000 requests are free and each next million costs $0.20. Add another $268.

Lambda duration cost

Now things become a little bit harder regarding the cost estimation. I would need to know upfront how much memory my code would be needing on Lambda and how long would it execute. This all is, of course, impossible without really trying it out. Arriving at this point also makes painfully obvious that I’ll still need to pay what I thought I saved by batching those events. I will make here a really modest estimates and suppose I would need only 128 MB of memory (the cheapest Amazon Lambda tier) and that my code would need 150 ms to process each single event i.e. 1.5 seconds for the whole batch. This makes a total of 2,008,480,000 seconds of work per month (first 3,200,000 seconds are free). Since the price per 100 ms is $0.000000208, we end up with $4178 of additional monthly cost. Oops.

Kinesis/Lambda costs recap

Cost of $100 per month for Kinesis turned out to be a real bargain. It saves me from having at least two nodes Redis cluster and an extra reverse proxy instance, and all that to achieve at least modestly comparable HA properties of Kinesis. Lambda, however, turned out to be too pricey for my budget, even when I was estimating with the cheapest tier.

Summary

To recap, I would be definitely pushing my data to Amazon Kinesis stream, but instead of Lambda I would be running e.g. a single c4.2xlarge instance ($345 monthly cost) with MultiLangDaemon and my slightly modified Ruby code. My guess is that this single machine would be able to process all 5 shards concurrently.

New solution managed to replace storing data to S3 and to remove most of the offline logs-processing EC2 instances, and with the costs remaining roughly the same. And yes, I managed to replace my poxy 24-hours-later analytics with a realtime solution. How cool is that?!

It seems that there are some new and shiny toys to play with on AWS. And once again, they come to rescue from the gruesome maintenance tasks of running software on EC2s, at least for the average back-end developer. But not all of them are for everyone and there is a hefty price tag attached to that Unbearable Lightness of Lambda.

An honorable mention to ElasticSearch ELK stack

Although I love ElasticSearch and its whole ELK stack, logstash (which is btw. the “L” in the ELK and a very, very cool product in its own right) would be more appropriate to use when we would be dealing with the raw logs instead of events. In order to use logstash I would need to write a plugin to deal with events sent by the desktop apps (some boilerplate plus the existing Ruby code). This all seems like an overkill compared to Amazon’s solution. In any other case where I would need to ingest more structured logs (like stuff coming from web servers), make them available for full-text search and even visualise, logstash is the way to go (make sure to check Jordan Sissel’s video).

Replacing Wordpress with Octopress

Sun, 22 Feb 2015 19:18:44 +0100

About a month ago, I embarked on an adventure to replace my WordPress blogging platform with a more Ruby-friendly alternative. I say ‘once again’ because the last time I attempted this, after about 30 minutes of somewhat futile searching, I simply gave up. This time, the urge was stronger, and I was lucky to have better results. But before diving into the details of my quest, let me explain my motivation for making this change.

The motivation

Most people would likely agree that WordPress is one of the most powerful blogging platforms worldwide. It’s highly customizable, and even more importantly, has a huge number of powerful add-ons that cover sitemap generation and Google Analytics to backups, Dropbox integration, themes, and more. After all, when you want to publish content on the internet, your primary concern is the ‘what’ rather than the ‘how,’ isn’t it? When I started writing short pieces for this site, my main concern was practicing writing, so WordPress seemed like a suitable tool.

Perhaps that’s the right choice for most people, but I had an itch that needed scratching. As a professional developer, WordPress didn’t quite satisfy me. I also wanted a more Ruby-friendly solution since I’m a Ruby developer and thought it might allow me to build something on top of it one day.

Budget considerations also played a role in my decision. I run this blog on DigitalOcean’s smallest instance with 512MB of RAM, and I’ve experienced issues where my blog went down due to MySQL consuming too much RAM. Although I enabled Linux swap once I identified the problem, the idea that WordPress was overkill for my needs stuck in my mind.

Enter The Octopress!

In short, I discovered Octopress, which allows me to create posts using Markdown and then generates static HTML pages served by nginx. This means no database, no server-side code, no swapping, amd no complicated backup/restore add-ons - just a code repository!

It works similarly to tools like AngularJS toolchain, with templates on one end, a templating engine in between, and raw HTML, CSS and JavaScript on the other. The key difference is that there are no Node.js, Grunt.js, Bower or other tools to manage - just some Ruby code. Sweet!

CORS font issues with Rails, Heroku, CloudFront and Passenger

Sat, 13 Sep 2014 17:41:41 +0100

Ever saw a log in your browsers console saying some resources like web fonts could not be loaded because Access-Control-Allow-Origin headers were missing? Did you think “oh, this should be easy” and then spent hours of searching through various misleading articles and even more hours applying those advices and still failing? Well, I sure did and here’s the story and how I finally won.

TL;DR;

You need to get Passenger’s nginx template, modify it to attach CORS headers and use it instead of the default one.

The Setup

My Rails app is hosted on Heroku and assets are served from the CloudFront distribution that has custom origin pointing back to the Rails app. Heroku precompiles my assets during slug compilation and stores them under the folder public/assets (check assets and cloudfront Heroku documents for details). All that is powered by standalone Passenger, just recently upgraded from Unicorn.

My config/environments/production.rb file contains something like this:

config.serve_static_assets = true
config.action_controller.asset_host = "//something.cloudfront.net"

First line means that my app’s assets will be served from the Rails app and not from nginx. Actually, Rails will inject here a special middleware (previously Rack::Static and more recently ActionDispatch::Static) and serve all files from the folder public. So whenever some resource is requested from the web app, it is first inspected by the middleware. If the file is found, it will be served directly from the file system. If not, the request will travel through the usual Rails routing and controllers stuff. This is useful if we would like to control custom headers for those resources.

I know what I’m doing…

The issue of missing CORS headers for web fonts was, I thought initially, a walk in the park. First I would need to inject manually those CORS headers by using some middleware injection magic, or even better, I would use font_assets gem. Then I would invalidate font assets in CloudFront to force a cache refresh and to get proper CORS headers. Unfortunately, it didn’t work. Whatever I’ve tried, CORS headers were nowhere to be seen.

Sobering up

Of course, the real breakthrough came only until I started paying much closer attention to what was being returned from those requests. If I requested a valid resource that existed in the public folder, I got Server: nginx/1.6.1 But if I requested some file that didn’t exist I got Server: nginx/1.6.1 + Phusion Passenger 4.0.50 along with all the CORS headers I could ever hoped for (and 404 error too). Which means that my serve_static_assets setting didn’t work; nginx was somehow instructed to serve my static assets, without my consent.

It turned out that Passenger standalone gem is installing its own nginx configuration file, compared to much simpler Unicorn gem I previously had. Here’s how it looked like in the original config.erb:

# Rails asset pipeline support.
location ~ "^/assets/.+-[0-9a-f]{32}\..+" {
    error_page 490 = @static_asset;
    error_page 491 = @dynamic_request;
    recursive_error_pages on;

    if (-f $request_filename) {
        return 490;
    }
    if (!-f $request_filename) {
        return 491;
    }
}
location @static_asset {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    add_header ETag "";
}
location @dynamic_request {
    passenger_enabled on;
}

Let me explain those couple of lines:

If a resource contains assets in path, contains a digest in its name and actually exists on the file system, it will be treated as a static resource and server by the location @static_asset setting.
If such resource’s file doesn’t exist on the file system, it will use location @dynamic_request i.e. go to the Rails app via Passenger.
If a resource doesn’t contain assets in path and/or doesn’t contain 32 characters digest in its name, it will always be treated as a static content with the usual location @static_asset code. My web fonts were such resources.

The Solution

What I did then is pretty straightforward; I copied that whole template, stored it in my config folder and modified it a bit. Here’s what I’ve changed to serve CORS headers (but only with web fonts):

# Rails asset pipeline support.
location ~ "^/assets/.+-[0-9a-f]{32}\..+" {
    error_page 490 = @static_asset;
    error_page 491 = @dynamic_request;
    recursive_error_pages on;

    if (-f $request_filename) {
        return 490;
    }
    if (!-f $request_filename) {
        return 491;
    }
}
# Fonts in assets that don't contain digest in file name.
location ~ "^/assets/.+\.(eot|svg|ttf|otf|woff)" {
    error_page 490 = @static_asset_fonts;
    error_page 491 = @dynamic_request;
    recursive_error_pages on;

    if (-f $request_filename) {
        return 490;
    }
    if (!-f $request_filename) {
        return 491;
    }
}
location @static_asset {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    add_header ETag "";
}
location @static_asset_fonts {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    add_header ETag "";
    add_header 'Access-Control-Allow-Origin' '*';
    add_header 'Access-Control-Allow-Methods' 'GET, HEAD, OPTIONS';
    add_header 'Access-Control-Allow-Headers' '*';
    add_header 'Access-Control-Max-Age' 3628800;
}
location @dynamic_request {
    passenger_enabled on;
}

Besides modifying nginx template, I needed to add to Procfile –nginx-config-template parameter and a path to my copy of template (for that parameter to work you need Passenger >= 4.0.39).

web:
  bundle exec passenger start -p $PORT \
  --max-pool-size ${WEB_CONCURRENCY:-3} \
  --nginx-config-template ./config/passenger_config.erb

The only remaining thing is to remember to update i.e. merge Passenger’s nginx template with my changes whenever I decide to update that gem.

A simple Ruby sitemap.xml generator

Sat, 12 Apr 2014 18:17:02 +0100

Yesterday, I completed a simple Ruby CLI tool that I’ve named SiteMapper. Its main purpose is to generate a sitemap.xml file, a format widely recognized by many popular search engines. You can find the tool at this GitHub link: https://github.com/okulik/lame-sitemapper.

During my initial tests, I realized that having a visual representation would be quite cool, rather than relying solely on space-indented text logs. As a result, I added a feature to generate a .dot file, which can then be converted into a .png image using the graphviz tool.

SiteMapper essentially serves as a straightforward, static web page hierarchy explorer. It starts from a page of your choice and navigates through the web site’s structure by following links. It will continue until it has traversed all the available content or until it reaches a predefined depth limit.

Links Normalization

The primary challenge in traversing links was determining whether a link had been visited before or not. Without a reliable mechanism, there would be a risk of endlessly navigating through pages, potentially stuck in a loop and jumping from one page to another indefinitely. To tackle this issue, I implemented a method for normalizing raw URLs. This involved expanding each ‘href’ value to its full path, removing any fragments, and sorting query parameters alphabetically. Let’s take a look at some of the Ruby code responsible for this process.

def self.get_normalized_url(host_url, resource_url)
  host_url = Addressable::URI.parse(host_url)
  resource_url = Addressable::URI.parse(resource_url)
 
  m = {}
  m[:scheme] = host_url.scheme unless resource_url.scheme
  unless resource_url.host
    m[:host] = host_url.host
    m[:port] = host_url.port
  end
  resource_url.merge!(m) unless m.empty?
  return nil unless SUPPORTED_SCHEMAS.include?(resource_url.scheme)
  return nil unless PublicSuffix.valid?(resource_url.host)
  resource_url.omit!(:fragment)
  resource_url.query = resource_url.query.split("&").map(&:strip).sort.join("&") 
    unless resource_url.query.nil? || resource_url.query.empty?
 
  return Addressable::URI.encode(resource_url, ::Addressable::URI).normalize
rescue Addressable::URI::InvalidURIError, TypeError
  nil
end

We parse URL string and convert it to Addressable:URI object (addressable is a ruby gem that servers as a replacement for the URI implementation that is part of Ruby’s standard library).
Host parameter is created from the starting URL, the one which we chose as a starting point of our web site quest. It is here also converted to Addressable::URI.
If URL is given without a scheme, often in the form of //www.nisdom.com/a-simple-ruby-sitemap-xml-generator/, we assume scheme and port number from a host. By calling merge, we also ensure that URLs like /a-simple-ruby-sitemap-xml-generator will end with host name too.
Check if host part of our URL is valid with PublicSuffix gem. Since HTML can contain any kind of text, we want to separate wheat from the chaff and make the content we will scrape as good as possible.
Remove everything from the right side of the # mark (i.e. fragments) since in most cases this will result in the same HTML content. Of course, if we are dealing with routing features of the single page apps written with e.g. AngularJS, we might get different content with different fragments (and different content might mean more URLs to crawl). But, as previously mentioned, SiteMapper is simple and deals only with static content.
Alphabetically sort query parameters. We don’t support JavaScript, forms and whatnot, but we do query parameters as they are rather easy (and I get to use that nice Ruby one-liner).
Finally, we encode any spaces and other non-URL compatible characters. Addressable to the rescue once again.

There are a couple of more interesting places and Crawler#should_crawl_page is one of them:

def should_crawl_page?(host, page, depth)
  unless UrlHelper.is_url_same_domain?(host, page.path)
  ...
  if @robots && @robots.disallowed?(page.path.to_s)
  ...
  if depth >= @opts[:max_page_depth].to_i
  ...
end

When traversing from page to page, should_crawl_page? is called for each new encountered link. It checks if link belongs to the same domain as the one we started with, if the link is allowed by robots.txt file and if we reached maximum traversal depth. is_url_same_domain? is dead simple:

def self.is_url_same_domain?(host_url, resource_url)
  ...
  host_url.host == resource_url.host
end

One more interesting method is is_url_already_seen?, which, once URL is normalized, tries to match with previously seen URLs. If URL was already seen, we simply ignore that path.

def is_url_already_seen?(url, depth)
  if @seen_urls[Digest::MurmurHash64B.hexdigest(url.omit(:scheme).to_s)]
  ...
end

Concurrent Downloads

Another intriguing aspect worth exploring is how pages are downloaded and processed concurrently. Given that downloading pages via HTTP is predominantly I/O-bound, it’s ok to create multiple threads and delegate downloads to them, even within MRI. To accomplish this, I implemented a producer-consumer concurrency pattern. Let’s go into a step-by-step explanation of the process. The following code snippets are extracted from the Core#start method, which represents the main thread of execution..

urls_queue = Queue.new
pages_queue = Queue.new
seen_urls = {}
threads = []
root = nil
 
Thread.abort_on_exception = true
(1..@opts.scraper_threads.to_i).each_with_index do |index|
  threads << Thread.new { Scraper.new(seen_urls, urls_queue, pages_queue, index, @opts, @robots).run }
end
 
urls_queue.push(host: host, url: start_url, depth: 0, parent: root)
loop do
  msg = pages_queue.pop
  if msg[:page]
    msg[:page].anchors.each do |anchor|
      urls_queue.push(host: host, url: anchor, depth: msg[:depth] + 1, parent: msg[:page])
    end
    ...
  end
  ...

Here we create two queues and a set of scraper threads. The main thread interacts with the scraper threads through these two queues. When there’s a need to fetch a particular page, a message is sent to the urls_queue, and the completed page objects, which are created and assembled by the scraper threads, are obtained from the pages_queue.

  ...
  if urls_queue.empty? && pages_queue.empty?
    until urls_queue.num_waiting == threads.size
      Thread.pass
    end
    if pages_queue.empty?
      threads.size.times { urls_queue << nil }
      break
    end
  end
end
 
threads.each { |thread| thread.join }

Here we attempt to determine if we’ve completed the task. If both queues are empty, and some threads are still actively processing pages (i.e., not all scraper threads are blocked, waiting on the urls_queue), we utilize a Thread.pass call within the loop to signal to the scheduler that we’re yielding our quota - this is Ruby’s equivalent of sleep(0). Once all scraper threads are finished, we check if there are any remaining pages waiting to be processed. If there are, we loop back to the beginning of the main loop. However, if there are no more pages, we send as many nil messages to the urls_queue as we have scraper threads and then wait for all of them to complete.

The main method of the scraper threads is quite simple. It dequeues messages containing page URLs to be processed and invokes the create_page method, which fetches the HTML, parses it (using the excellent Nokogiri gem), and ultimately generates a page object. This object is then pushed back into the pages_queue, from where the main thread takes charge and integrates it into the directed graph of pages.

loop do
  msg = @urls_queue.pop
  unless msg
    LOGGER.debug "scraper #{@index} received finish message"
    break
  end
 
  page = create_page(msg)
 
  @pages_queue.push(page: page, url: msg[:url], depth: msg[:depth], parent: msg[:parent])
end

Conclusion

In a nutshell, SiteMapper Ruby CLI tool, allows simple generation of sitemap.xml files. It not only simplifies web page hierarchy exploration but also offers a nice visual representation, making the process more intuitive. Here I provided a sneak peek into its inner workings, from URL normalization to concurrent downloads, making it perhaps a handy tool for web developers.

Embedding a 64-bit binary in a 32-bit Windows C++ app

Tue, 11 Feb 2014 19:56:24 +0100

A while back, I ran into this tricky issue on Windows 7. I had a small Windows tray utility written in C++ that was supposed to do a few things, including checking if Microsoft Word was running and if it was the top-level app. To do this, I was using the EnumProcesses API. Here’s what that looked like:

if (::EnumProcesses(aProcesses, sizeof(aProcesses), &cbNeeded)) {
  cProcesses = cbNeeded / sizeof(DWORD);
  for (i = 0; i < cProcesses; i++) {
    if (aProcesses[i] != 0) {
      HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, aProcesses[i]);
      if (NULL != hProcess) {
        HMODULE hMod;
        DWORD cbNeeded;
        if (::EnumProcessModules(hProcess, &hMod, sizeof(hMod), &cbNeeded)) {
          ::GetModuleFileNameEx(hProcess, hMod, processName, sizeof(processName));

Everything was going fine until one day the app couldn’t detect Word anymore. Surprise, surprise, it acted up when I ran it for the first time on Windows 7 64-bit with 64-bit Word 2010. It turned out that from a 32-bit virtualized process (running under the WOW64 subsystem), I couldn’t enumerate 64-bit processes!

There were two obvious ways to tackle this problem: a) create a 64-bit version of my tray app and distribute it separately, and b) use WMI for the job. I won’t go into too many details about why I ditched WMI. Let’s just say it’s a lot slower compared to using WIN32 APIs directly, and it makes you deal with COM. More importantly, I already had the existing code base up and running, and I didn’t want to turn everything upside-down when all I needed was to create another project file and check all the right boxes for 64-bit compilation.

So, I went with the 64-bit compilation solution, but there was still something bothering me. How could I avoid including two separate executable images in the installation package? Then I remembered something interesting I had noticed while using the super popular Process Explorer utility on 64-bit Windows, although I didn’t pay much attention to it before. Here’s what I saw:

Turns out, Process Explorer creates a 64-bit version of itself, and what’s even more interesting is that the 32-bit version bundles its 64-bit counterpart as a binary resource. When procexp.exe is launched, it extracts the 64-bit binary from its resources to a temporary folder and runs it as a child process. That’s exactly what I was hoping to achieve. So, let me show you how I did it, so you can create your own self-unpacking 64-bit version of your 32-bit app!

Embedding a 64-bit binary as a resource

First things first, you’ll need to create an additional 64-bit Visual Studio configuration for your project. This configuration should compile your code to run natively on 64-bit Windows.

Next, go to your 32-bit project and, under Resources -> General properties, add something like $(SolutionDir)x64$(Configuration) to Additional Include Directories. This will allow you to reference your 64-bit binary from the .rc file.

Embedding a 64-bit binary in a 32-bit binary’s resources is quite straightforward. Just add the following code somewhere inside your .rc file:

#if !defined (_WIN64)
IDR_MYTRAYAPP64 RCDATA "MyTrayApp64.exe"
#endif

Also make sure you have the following inside the Resource.h file.

#ifndef _WIN64
#define IDR_MYTRAYAPP64              400
#endif

Feel free to replace 400 with any value that suits your needs better. Also, make sure to add the _WIN64 preprocessor constant to your 64-bit project file. If you attempt to build a 32-bit binary at this point, you’ll notice that its size has increased by the size of the 64-bit binary.

Extracting 64-bit binary

Let’s wrap up the configurations for now and dive into some code snippets that will help us extract the binary from resources and more.

We’ll begin with the trusty old entry point function, WinMain:

int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE /*hPrevInstance*/,
                    LPSTR lpCmdLine, int nCmdShow) {
...
#if defined _WIN64
  if (!getCmdOption(__argv, __argv + __argc, "-run64", out)) {
    displayMessage(hInstance, IDS_MISSING_ARG_RUN64);
    return FALSE;
  }
#endif
...

In this section, I’m reading various command line options. The code for both the 32-bit and 64-bit versions is identical, but I’m using preprocessor guards for platform-specific handling. The intention is to ensure that the 64-bit binary cannot be run independently, essentially giving a warning to accidental double-clickers.

Moving on, further down the road, we encounter something like this:

#if !defined (_WIN64)
  if (is64BitWindows()) {
    HRSRC res = ::FindResource(hInstance, MAKEINTRESOURCE(IDR_MYTRAYAPP64), RT_RCDATA);
    if (!res) {
      LOG(ERROR) << "unable to find embedded resource MyTrayApp64.exe";
      return false;
    }
    HGLOBAL resHandle = ::LoadResource(NULL, res);
    if (!resHandle) {
      LOG(ERROR) << "unable to load resource MyTrayApp64.exe";
      return false;
    }
    char *resData = (char*)::LockResource(resHandle);
    DWORD resSize = ::SizeofResource(NULL, res);

    char tempPath[MAX_PATH + 1];
    DWORD tempPathSize = ::GetTempPath(MAX_PATH, tempPath);
    if (tempPathSize == 0 || tempPathSize > MAX_PATH) {
      LOG(ERROR) << "unable to get path to temporary folder";
      return false;
    }

    string targetPath(tempPath);
    targetPath.append("MyTrayApp64.exe");
    ofstream outputFile(targetPath, std::ios::binary);
    outputFile.write((const char *)resData, resSize);
    outputFile.close();

Is64BitWindows is a small utility function that returns true if our normal 32-bit process is being run from the WOW 64-bit subsystem i.e. 64-bit Windows.

#if !defined (_WIN64)
  bool is64BitWindows() {
    BOOL f64 = FALSE;
    return ::IsWow64Process(::GetCurrentProcess(), &f64) && f64;
  }
#endif

The next step involves loading and locking a specific resource and then writing it to a file in a temporary folder. That part was easy! However, before running that image, there’s one more small thing to do: creating a new job object and assigning my currently running process to it.

HANDLE job = NULL;
if (processInJob == 0) {
  job = ::CreateJobObject(NULL, NULL);
  if (NULL == job) {
    LOG(ERROR) << "unable to create job object";
    return false;
  }

  JOBOBJECT_EXTENDED_LIMIT_INFORMATION jeli;
  ::ZeroMemory(&jeli, sizeof(jeli));
  jeli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE |
    JOB_OBJECT_LIMIT_BREAKAWAY_OK;
  if (0 == ::SetInformationJobObject(job, JobObjectExtendedLimitInformation,
                                    &jeli, sizeof(jeli))) {
    LOG(ERROR) << "unable to setup job object";
    ::CloseHandle(job);
    return false;
  }

  if (0 == ::AssignProcessToJobObject(job, ::GetCurrentProcess())) {
    LOG(ERROR) << "could not assign process to job object, error: " << ::GetLastError();
    ::CloseHandle(job);
    return false;
  }

  LOG(INFO) << "assigned process pid " << ::GetCurrentProcessId() << " with a job";
}

If a process is included in the job, any other child processes I create afterward will also become part of the same job. An intriguing flag here is JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, which essentially means that if the parent process terminates, all its children will also be terminated. This way, I don’t have to concern myself with the 64-bit process lingering after its counterpart has exited.

Creating a 64-bit process

Lastly, I can create a 64-bit process:

STARTUPINFO si;
PROCESS_INFORMATION pi;
ZeroMemory(&si, sizeof(si));
si.cb = sizeof(si);
ZeroMemory(&pi, sizeof(pi));

string params(lpCmdLine);
params.append(" -run64 \"\"");

LOG(INFO) << "starting 64 bit app";
if (::CreateProcess(targetPath.c_str(), const_cast<char*>(params.c_str()),
                    NULL, NULL, TRUE, 0, NULL, NULL, &si, &pi)) {
  LOG(INFO) << "waiting for 64 bit app to exit";
  ::WaitForSingleObject(pi.hProcess, INFINITE);
  ::CloseHandle(pi.hThread);
  ::CloseHandle(pi.hProcess);
  if (NULL != job)
    ::CloseHandle(job);
}

One interesting point to observe here is that after creating the 64-bit process, I promptly initiated the waiting process for its termination. This might seem a bit odd, but it sets the stage for the concept of running both versions of the app simultaneously (hence the use of the job). How they will communicate is something you’ll need to determine based on your specific requirements.

Conclusion

In a nutshell, this clever method of embedding a 64-bit binary within a 32-bit Windows C++ application solves the problem of detecting and running on 64-bit systems. By following the steps detailed in this article, you can have a single codebase that smoothly adapts to both architectures. The nifty use of job objects and process management helps you control the execution of both 32-bit and 64-bit parts of your application. Happy coding!

neo4j tweaks

Fri, 25 Jan 2013 22:45:00 +0100

Increase the number of open files for all users:

sudo vi /etc/security/limits.conf
*       soft    nofile  100000
*       hard    nofile  100000

Add JVM tweaks to the neo4j-wrapper.conf:

sudo vi /usr/local/neo4j/conf/neo4j-wrapper.conf
wrapper.java.additional.1=-d64
wrapper.java.additional.1=-server
wrapper.java.additional.1=-Xss2048k

Ruby http.rb segmentation fault solution for OSX

Fri, 23 Nov 2012 08:15:12 +0100

If you’re encountering segmentation faults with your Ruby interpreter on your Mac, be it Ruby 1.8.6, 1.9.2, or 1.9.3, and your code involves interactions with HTTPS endpoints, the issue may stem from a faulty installation of OpenSSL via MacPorts.

To resolve this issue, you’ll need to build OpenSSL from source and then rebuild Ruby while specifying the path to the OpenSSL installation. If you’re like me and use RVM to manage different Ruby versions on your machine, here’s how you can do it:

rvm pkg install openssl
rvm uninstall 1.9.2
rvm install 1.9.2 --with-openssl-dir=$HOME/.rvm/usr

Farewell to OCZ Vertex 2

Tue, 29 May 2012 13:12:07 +0100

About a week ago, my trusty iMac from 2010 decided to throw a curveball at me. It simply refused to boot, leaving me staring at a blinding white screen with no comforting Apple logo in sight. I tried every keyboard shortcut in the book, but the only one that seemed to do anything was resetting the NVRAM. Much to my relief, the reset worked, and I was greeted with that familiar and reassuringly loud Apple boot sound.

With my iMac springing back to life, my initial fears of catastrophic hardware failure (you know, the motherboard, CPU, or RAM giving up) began to fade. Instead, I shifted my attention towards the SSD. It was only about a year ago that I had replaced the internal DVD drive with OCZ’s Vertex 2 120GB SSD.

As I disconnected the SSD from its SATA cables, my beloved iMac sprang back to life. It became clear that the SSD was the culprit in this booting thriller. A friend of mine had encountered a similar issue with the same iMac model a couple of months earlier, but his experience was slightly less tragic. His SSD had gone completely blank, but with a HDD backup, he was able to recover his data. Unfortunately, I wasn’t as fortunate, as my SSD went kapput. However, I prefer this sudden death to the slow and agonizing decline of a failing drive.

Thankfully, OCZ’s 3-year warranty came to the rescue, and I received a replacement Vertex 3 SSD. With this new drive in hand, I was able to restore my system from a Time Machine backup. This whole episode served as a stark reminder that having an SSD in your computer today without a reliable day-to-day backup is a disaster just waiting to happen.

Posts on Nisdom. As opposed to Wisdom.

Move to fly.io

Installing mysql2 Ruby gem on MacOS

Installing the mysql2 gem

Linker trouble

Broken mysql_config

Almost done

Alternative solution (TL;DR)

Aws Lambda Primer With Ruby using the RedShift, Secrets Manager and S3

1. Basic scaffolding

Pro tip: Use Honeybadger.notify(..., sync: true, ...) when sending notifications from AWS Lambda.

2. Connecting to a RedShift

Pro tip: package libpq.so.5 with your lambda code or have it in a layer.

3. Accessing the Secrets Manager and S3

Pro tip: accessing Secrets Manager requires a NAT Gateway (using public internet) or interface Endpoint (preferable) once you put lambda inside the VPC

4. Reusing the database connection

Pro tip: on a single box lambda runtime executes your handler code in a loop, synchronously.

5. Zip package liposuction

Pro tip: know what goes into your lambda package!

Minimalistic logging from Docker containers

Writers will write

And readers will read

A matter of logs

Problem description and motivation

Simple(db) solution

Fast forward today

Producers

Consumers

Cost estimate

Kinesis shard-hour cost

Kinesis PUT record cost

Lambda requests count cost

Lambda duration cost

Kinesis/Lambda costs recap

Summary

An honorable mention to ElasticSearch ELK stack

Replacing Wordpress with Octopress

The motivation

Enter The Octopress!

CORS font issues with Rails, Heroku, CloudFront and Passenger

TL;DR;

The Setup

I know what I’m doing…

Sobering up

The Solution

A simple Ruby sitemap.xml generator

Links Normalization

Concurrent Downloads

Conclusion

Embedding a 64-bit binary in a 32-bit Windows C++ app

Embedding a 64-bit binary as a resource

Extracting 64-bit binary

Creating a 64-bit process

Conclusion

neo4j tweaks

Ruby http.rb segmentation fault solution for OSX

Farewell to OCZ Vertex 2

Pro tip: Use `Honeybadger.notify(..., sync: true, ...)` when sending notifications from AWS Lambda.

Pro tip: package `libpq.so.5` with your lambda code or have it in a layer.