Fetching Vendor Data at Scale with Serverless

I’ve been using the Serverless Framework for a while now to rapidly develop and provision Lambda functions, as well as other dependent AWS resources — it makes a developer’s life much easier.

Via Serverless.com

While you can certainly work with serverless functions outside of the Serverless Framework, it offers some major benefits that streamline deployment, including a powerful plugin system to enhance its default capabilities.

A recent request at my company had requirements that seemed like a good fit for serverless:

  1. process a one-time backlog of one million requests to a vendor’s API
  2. minimize batch processing time while respecting the API rate limit

If your first question was: can the vendor just complete this batch on their end and simply send us the output? Unfortunately, no!

Approach

As with most designs, this task could have been accomplished in a number of different ways. Since it was a one-time job, I opted for a solution requiring minimal development time while still remaining reasonably performant.

The vendor API from which I was requesting data enforced a rate limit of 600 requests per minute — for the sake of this post, we’ll be able to consume all 600 in our batch process. This means, if ran at full speed, it should complete in ~ 28 hours: ((1000000/600)/60)= ~27.78

Batch API

This estimation wasn’t terrible, but it was based on requesting a single record at a time. That’s a lot of round trips and traffic across the wire. After reading through the vendor’s documentation a bit more, I found they offered a batch version of the same endpoint — this one supported 300 record requests at a time, significantly lowering submission time. 

Note I say submission and not processing time. The batched results were returned via a webhook, i.e. a POST with a payload containing the results to a client-specified callback endpoint. In fact, this is one reason cloud resources were needed at all instead of running the whole process locally.

Here’s a simple visualization of the batch flow:

This asynchronous model is clearly much more efficient than fetching individual responses. Now, the process only required 3,333 submissions and, at 600 submissions per minute, I could submit them all in under 6 minutes… assuming the vendor endpoint was able to keep up!

Infrastructure

As hinted above, submitting batch requests at full rate limit utilization quickly overwhelmed the vendor’s infrastructure (oops!) — the service degraded after a minute or so. Since it was now clear that batch submissions needed to be throttled, I added an SQS queue to store outgoing requests and a Cloud Watch rule to trigger the “batch submit” lambda function once a minute.

The final solution looked like this:

And the resource definition:

Resource Description
SQS Queue Store pending request sets/batches, input from CSV
DynamoDB Table Store batchIds for submitted/pending batch requests with respective mapping data
Load SQS script Slice data into batches of 300 and submit to SQS Queue
Batch Submit Lambda (submit) Fetch (100) messages and submit batch request to vendor API. Persist returned batch ID w/ mapping attributes to DynamoDB table
Webhook Receiver Lambda (webhook) Fetch matching DynamoDB record, associate mapping attributes, store results, and delete DynamoDB record – will scale up concurrently to handle incoming batch result requests
CloudWatch Rule Trigger the batch submission Lambda function above once every minute

Though this might seem like a lot for a quick n’ dirty process, each resource definition is pretty straightforward and the Lambda functions above are isolated to basic duties. This means minimal code for each and, thanks to the Serverless Framework, an easy deployment process. 

Serverless

Next, let’s see how each resource is defined for deployment in the serverless.yml. 

service: batch
package:
  exclude:
    - '*.csv'
    
custom:
  queue_name: '${self:service}-${opt:stage, "dev"}-messages'
  stage: '${opt:stage, "dev"}'
  
provider:
  name: aws
  runtime: python3.6
  memorySize: 512
  timeout: 300
  versionFunctions: false
  stage: '${self:custom.stage}'
  region: us-east-1
  iamRoleStatements:
    -
      Effect: Allow
      Action:
        - 'dynamodb:Query'
        - 'dynamodb:Scan'
        - 'dynamodb:GetItem'
        - 'dynamodb:PutItem'
        - 'dynamodb:UpdateItem'
        - 'dynamodb:DeleteItem'
      Resource: 'arn:aws:dynamodb:${opt:region, self:provider.region}:*:table/${self:provider.environment.DYNAMODB_TABLE}'
  environment:
    DYNAMODB_TABLE: '${self:service}-${self:custom.stage}'
    # API Gateway endpoint URL for receiving webhooks
    GW_URL: { "Fn::Join" : ["", [ "https://", { "Ref" : "ApiGatewayRestApi" }, ".execute-api.${self:provider.region}.amazonaws.com/${self:custom.stage}" ] ]  }
    SQS_URL:
      Ref: Messages
      
functions:
  webhook:
    handler: webhook.lambda_handler
    events:
      -
        http:
          method: post
          path: /
  invoke:
    handler: invoke.lambda_handler
    events:
      -
        schedule: 'rate(1 minute)'
        enabled: true
    reservedConcurrency: 1
  submit:
    handler: submit.lambda_handler
    
resources:
  Resources:
    Messages:
      Type: 'AWS::SQS::Queue'
      Properties:
        QueueName: '${self:custom.queue_name}'
        MessageRetentionPeriod: 1209600
        VisibilityTimeout: 300
    BatchTable:
      Type: 'AWS::DynamoDB::Table'
      DeletionPolicy: Delete
      Properties:
        AttributeDefinitions:
          -
            AttributeName: id
            AttributeType: S
        KeySchema:
          -
            AttributeName: id
            KeyType: HASH
        ProvisionedThroughput:
          ReadCapacityUnits: 600
          WriteCapacityUnits: 600
        TableName: '${self:provider.environment.DYNAMODB_TABLE}'

After running the serverless deploy command, all resources are created or modified upon deployment, including appropriate IAM role permissions. Resources are deleted when running serverless remove — perfect for a one-time process, as we won’t have to track down the temporary resources for manual removal.

Conclusion

The Serverless Framework, and serverless functions in general, are powerful and dynamic. Though serverless isn’t right for all use-cases, and generally shouldn’t be considered the default answer, improvements are constantly being made to address some of the common pitfalls. It’s hard to deny the impact FaaS will have on our software’s infrastructure in both the short and long term.

In future posts, I plan to expand on the other ways I’ve leveraged serverless, including some issues I’ve faced and potential workarounds/solutions.

Share this: Facebooktwitterlinkedin