AWS S3 Node

AWS S3 is an object storage service offering scalability, data availability, security, and performance. S3 can be used as the storage foundation for a data lake due to its ability to store and retrieve any amount of data from anywhere on the web.

What is AWS S3?

The AWS S3 plays a vital role in integrating Amazon Simple Storage Service (S3) into data processing workflows. S3 serves as a highly scalable and durable storage solution provided by Amazon Web Services (AWS). The input node facilitates the periodic polling of data from designated S3 buckets, enabling seamless retrieval of stored information for processing. This functionality is particularly useful in scenarios where data stored in S3 needs to be ingested into various processing pipelines or applications. By polling the S3 bucket at specified intervals, the input node ensures that any new or updated data is promptly captured and made available for further analysis, processing, or archival, enhancing the efficiency and flexibility of data management workflows.

Outputs

The AWS S3 node will output payload based on the latest file available in the AWS S3 bucket at the interval configured.

How to Use

Implementation

In the Rayven Workflow Builder:

Select Inputs.
Drag the AWS S3 node to the canvas.
Double click on the AWS S3 node to open the configuration window.

Configuration

Section: General
This section contains basic configuration elements required for any AWS S3 node implementation
Field	Requirement	Comments
Node Name	Mandatory	Enter a name for this node. This provides a handle to which you and others can refer, so it should be simple but meaningful and explain the node’s purpose.
Bucket Name	Mandatory	Enter the name of the bucket located on AWS S3.
Access Key ID	Mandatory	Enter the Access Key ID that was configured on the AWS account. Please ensure that the user has access to the AWS S3 bucket.
Secret Key	Mandatory	Enter the secret key that was configured on the AWS account.
Folder Name	Mandatory	Enter the folder name within the bucket which holds the files.
AWS Region	Mandatory	Enter the region where the AWS bucket is hosted from the dropdown menu.
Interval between File Downloads (minutes)	Mandatory	Enter a numeric value that is greater than 0.
Payload Format	Mandatory	Select the file format within the folder from the dropdown menu JSON JSON Array CSV CSV(no header row) XML String

Section: Advanced Features
This section contains Advanced configuration elements required for any AWS S3 implementation
Field	Requirement	Comments
Payload Device ID Field Name	Optional	Enter the field name that holds the Device ID. If this is not configured, the payload will be tied to the device __none__ by default which can be viewed using the debug node.
Filter by File Name	Optional	Define the file name that should be read from the folder. Rayven’s AWS S3 node supports the use of wild cards (*).
Variable Name for File Name	Optional	Define the JSON key that will hold the file name as the value to be used in the workflow.
Remove Extension from File Name	Optional	If Variable Name for File Name is used, check this box to remove the extension from the file name (eg. remove .csv, .txt)

Section: Payload Date Settings
This section allows the user to define the field that holds the timestamp of the data.
Field	Requirement	Comments
Date Field Name	Optional	Define the name of the date field coming in from your device
Date Format	Optional	Define the format of how the data will be coming through (eg. dd/MM/yyyy HH:mm:ss)

Activation

Once the node has been configured, click the Activate button and then click Save. The node is ready to process data from AWS S3 bucket.