AWS S3 Node

AWS S3 is an object storage service offering scalability, data availability, security, and performance. S3 can be used as the storage foundation for a data lake due to its ability to store and retrieve any amount of data from anywhere on the web.

What is AWS S3?

The AWS S3 plays a vital role in integrating Amazon Simple Storage Service (S3) into data processing workflows. S3 serves as a highly scalable and durable storage solution provided by Amazon Web Services (AWS). The input node facilitates the periodic polling of data from designated S3 buckets, enabling seamless retrieval of stored information for processing. This functionality is particularly useful in scenarios where data stored in S3 needs to be ingested into various processing pipelines or applications. By polling the S3 bucket at specified intervals, the input node ensures that any new or updated data is promptly captured and made available for further analysis, processing, or archival, enhancing the efficiency and flexibility of data management workflows.

Outputs

The AWS S3 node will output payload based on the latest file available in the AWS S3 bucket at the interval configured.

How to Use

Implementation

In the Rayven Workflow Builder:

  • Select Inputs.
  • Drag the AWS S3 node to the canvas.
  • Double click on the AWS S3 node to open the configuration window.

Configuration

Section: General

This section contains basic configuration elements required for any AWS S3 node implementation

Field

Requirement

Comments

Node Name

Mandatory

Enter a name for this node.


This provides a handle to which you and others can refer, so it should be simple but meaningful and explain the node’s purpose.

Bucket Name

Mandatory

Enter the name of the bucket located on AWS S3.

Access Key ID

Mandatory

Enter the Access Key ID that was configured on the AWS account.

Please ensure that the user has access to the AWS S3 bucket.

Secret Key

Mandatory

Enter the secret key that was configured on the AWS account.

Folder Name

Mandatory

Enter the folder name within the bucket which holds the files.

AWS Region

Mandatory

Enter the region where the AWS bucket is hosted from the dropdown menu.

Interval between File Downloads (minutes)

Mandatory

Enter a numeric value that is greater than 0.

Payload Format

Mandatory

Select the file format within the folder from the dropdown menu 

  • JSON
  • JSON Array
  • CSV
  • CSV(no header row)
  • XML
  • String


Section: Advanced Features

This section contains Advanced configuration elements required for any AWS S3 implementation

Field

Requirement

Comments

Payload Device ID Field Name

Optional

Enter the field name that holds the Device ID. If this is not configured, the payload will be tied to the device __none__ by default which can be viewed using the debug node.

Filter by File Name

Optional

Define the file name that should be read from the folder. Rayven’s AWS S3 node supports the use of wild cards (*).

Variable Name for File Name

Optional

Define the JSON key that will hold the file name as the value to be used in the workflow.

Remove Extension from File Name

Optional

If Variable Name for File Name is used, check this box to remove the extension from the file name (eg. remove .csv, .txt)



Section: Payload Date Settings

This section allows the user to define the field that holds the timestamp of the data.

Field

Requirement

Comments

Date Field Name

Optional

Define the name of the date field coming in from your device

Date Format

Optional

Define the format of how the data will be coming through (eg. dd/MM/yyyy HH:mm:ss)

Activation 

Once the node has been configured, click the Activate button and then click Save. The node is ready to process data from AWS S3 bucket.