Appendix B — Devs

Dev resources go here

B.1 Organization

General org of code

B.1.1 Raw Functions

DDH package repo: ddh

B.1.2 Shiny Function

repo: private

B.1.3 Shiny modules

Notes on shiny modules

B.2 Data Generation

Notes on data generation (link to vidoe?)

B.3 Data Serving

Notes on site

B.3.1 Reports

One of the key features of DDH is the ability to download a report for any gene. This report contains all of the information that is available on the website for a query, as well as a zipped directory of all of the images from the report, to easily use these images in presentations or manuscripts. The report takes a few minutes to render, and an early version of the app required a user to wait until the rendering was finished to download the file. If the user navigated away from that page, or close the browser, then the rendering would stop and the user would not receive the report. This is not a good user experience. The challenge then is how to deliver reports through a good user experience.

In this article, we’re going to describe how we used a simple queuing system, combined with a containerized application to generate and email reports all within R.

Shiny apps have a single process that can handle concurrent users, but lengthy computational steps are processed serially or require explicit parallel processing. Thus, to ensure that users don’t need to wait for another user to finish rendering a report, we decided to create a standalone report generator application. The organization and logic of this approach allow the primary application to send a message to a queuing service. The queuing service then sends a message to a report generator application, to generate the report based on the details of the message. At the end of the report generation and sending, the messages are consumed. The application then polls for the next message and will do this continuously. The report generally application simply poles for messages at a defined frequency, and then sends a report upon request. Importantly, this is an entirely separate application from the primary Shiny app.

Because we are currently migrating the entire infrastructure over to AWS, we decided on the SQS service from Amazon. This service can handle a much higher volume than we will ever generate, and they have a free tier (up to 1 million messages). They offered two different services, one of which is the basic service, and then a premium first in first out service. The premium service guarantees that the messages that come in first come out will be processed first. The free tier however has the caveat that messages could be re-shuffled. Again, because of the anticipated low overall flux, the free tier is suitable for our purposes.

Let’s consider each of the components separately before we put them together.

  1. Generating the report. We recently switched over from rendering RMD documents to rendering quarto documents. The goal was to generate a report template, that used specific and reusable child chunks, now added by “include”. To report template had default parameters, but the report generator function supplied specific parameters to generate the report that the user requested. We have three different report templates, depending on the type of report that needs to be generated. The reports then use some common and some unique sections. Wrapping these parameters into arguments of a function allows the function to generate any report.
  2. Adding the images to a directory and zipping it all together. previously, we needed to render the images, look in the environments of the function, grab those images if they met a certain criterion, and append a vector with these images for inclusion in a zip file. However with the move to Quarto, the images are rendered in a temporary directory, and after the report is fully rendered, these images can be moved from this directory into a defined director for zipping. Thus, the report is a zip file containing a report dot HTML document and a directory of PNG images.
  3. Emailing the report. After a report is generated, it becomes quite simple to email this file using the package blastula. This package allows programmatic email generation using SMTP credentials. Thus, a Nice looking, HTML-based email can be generated, and the report zip file is easily attached.
  4. Checking to see if this report is already made. Rather than remaking several of the same reports, we had a simple check before rendering to see if the report exists in an S3 bucket. If not then it renders the report and uploads it to the bucket before sending. However, if the report exists, then rather than rendering, it simply grabs the zip file and continues along as if it was rendered locally. This can save both computational expense, as well as speed up the report generation time.
  5. Setting up SQS. If you do not yet have an AWS account, sign up for one. Setting up SQS is quite simple. There are essentially two primary services to choose from: the free and the paid tiers. After this setup, the specific SQS URL will be generated which then receives messages.
  6. Sending the message. The queuing system is expecting a Json payload. This means that we need to first generate a JSON object, and then send it to the SQS. using the package Jsonlite, we are able to create this Json object that then is appended to our message. The JSON contains the arguments that will be converted into parameters for report generation. To send a message, we can use the broadly useful PAWS package, which has functions to interact with this aws service. IMPORTANT note: params are essentially key:value pairs. Some of the data that we want to make a report about are vectors or lists. Thus, we need to add a step to flatten a vector into length one, pass it as part of our JSON object, and then feed it to our report as a parameter, and finally re-convert it into a vector longer than length one. When we put it all together, we have a short script that looks for a message, pulls the Json out of that message, and feeds it to our report generator and emailer functions
  7. Saving report Params. Before we can send a message, we can save an additional object, in this case, a CSV file, that has information about the report. After the reports are deleted, there is no record of them. One option would be to create a lambda function for AWS to handle all of this internally. However, we can simply make an S3 bucket that is only used for saving parameter information. Then it is easy to create a summary function for us to pull these objects out of the bucket, and create summary statistics about the reports that were made over a given period of time.
  8. Consuming the message. Lastly, we can send the message which erases all traces of it. These messages are ephemeral, think Snapchat, and only serve to deliver a set of arguments from one app to another. After deleting a message, we wrap all of these steps in a repeat loop that then has a system sleep call to repeat at our desired frequency. Given the anticipated fucks of our messages and users on the app, checking every minute for messages every hour of every day all year seems like plenty. From another perspective, asking a user to wait an extra minute before their report is emailed is not too long of an ask.

From a user perspective, they are asked to fill out a short form with their name and email and the details of the report they want to be sent. upon clicking submit, The user does not need to wait or keep their browser open because A message is sent via SQS to another containerized app that is polling for messages. Upon receiving the message, the containerized app kicks off a report generation if that report has not been previously generated. The report is then emailed from the container using SMTP credentials. The user then receives a report within a few minutes that contains both an HTML report, as well as a directory of images.

From a technical perspective, the SQS system can receive far more messages than we will ever send. If 10 users simultaneously ask for a report, then 10 reports will be processed sequentially with no possibility of failure. As a side benefit, user email addresses are captured which could then be opted into mailings about app updates or other relevant news. As one example, when new data are added, previous report generators could be prompted to generate new reports with the new data.

In summary, by separating the computationally expensive task of report generation from the main app, we can enhance the user experience. Using a queuing system like SQS allows for a simplified way for two containers to communicate.

B.3.2 Methods

The methods section of DDH is a detailed accounting of the organization, the computational details, and the rationale of the components of the app. This section contains many more details than a report contains, but includes several of the components of the report. Thus, we set out to develop a methods generator that reused several of the report’s components.

Methods generation lives in the report generator container. Report generation lives in a standalone application separate from the main app. While method generation might intuitively live in the data generation container, all of the sections of the report that we want to include in the methods live in the report container, therefore methods generation should also live in this container.

Methods organization. To make a complex methods section intuitive, we chose to organize the methods using the same layout as the app. The methods start with some high-level rationale for the project. And then detail the types of information that can be generated for each of the query types. Importantly, these are laid out just as if you were navigating the main app.

The methods document leverages a Quarto book layout, with a standalone YAML document that sets the parameters of how the book will be rendered, its layout, and importantly its sections and subsections. Each section is a standalone QMD document. A subsection can be a standalone document or be called within that document. The organization of the DDH methods has level one sections are standalone documents, and level two sections are include documents.

One of the important distinctions between RMD renders and Quarto renders is that Quarto will knit then merge, whereas RMD will merge then knit. What this means in practice is that each Quarto document is knitted as a standalone document: it needs its own parameters, its own setup chunks, and its own data.