AWS S3 is an object storage service offering scalability, data availability, security, and performance. S3 can be used as the storage foundation for a data lake due to its ability to store and retrieve any amount of data from anywhere on the web.
What is AWS S3?
The AWS S3 plays a vital role in integrating Amazon Simple Storage Service (S3) into data processing workflows. S3 serves as a highly scalable and durable storage solution provided by Amazon Web Services (AWS). The input node facilitates the periodic polling of data from designated S3 buckets, enabling seamless retrieval of stored information for processing. This functionality is particularly useful in scenarios where data stored in S3 needs to be ingested into various processing pipelines or applications. By polling the S3 bucket at specified intervals, the input node ensures that any new or updated data is promptly captured and made available for further analysis, processing, or archival, enhancing the efficiency and flexibility of data management workflows.
Outputs
The AWS S3 node will output payload based on the latest file available in the AWS S3 bucket at the interval configured.
How to Use
Implementation
In the Rayven Workflow Builder:
- Select Inputs.
- Drag the AWS S3 node to the canvas.
- Double click on the AWS S3 node to open the configuration window.
Configuration
Section: General |
||
This section contains basic configuration elements required for any AWS S3 node implementation |
||
Field |
Requirement |
Comments |
Node Name |
Mandatory |
Enter a name for this node. This provides a handle to which you and others can refer, so it should be simple but meaningful and explain the node’s purpose. |
Bucket Name |
Mandatory |
Enter the name of the bucket located on AWS S3. |
Access Key ID |
Mandatory |
Enter the Access Key ID that was configured on the AWS account. Please ensure that the user has access to the AWS S3 bucket. |
Secret Key |
Mandatory |
Enter the secret key that was configured on the AWS account. |
Folder Name |
Mandatory |
Enter the folder name within the bucket which holds the files. |
AWS Region |
Mandatory |
Enter the region where the AWS bucket is hosted from the dropdown menu. |
Interval between File Downloads (minutes) |
Mandatory |
Enter a numeric value that is greater than 0. |
Payload Format |
Mandatory |
Select the file format within the folder from the dropdown menu
|
Section: Advanced Features |
||
This section contains Advanced configuration elements required for any AWS S3 implementation |
||
Field |
Requirement |
Comments |
Payload Device ID Field Name |
Optional |
Enter the field name that holds the Device ID. If this is not configured, the payload will be tied to the device __none__ by default which can be viewed using the debug node. |
Filter by File Name |
Optional |
Define the file name that should be read from the folder. Rayven’s AWS S3 node supports the use of wild cards (*). |
Variable Name for File Name |
Optional |
Define the JSON key that will hold the file name as the value to be used in the workflow. |
Remove Extension from File Name |
Optional |
If Variable Name for File Name is used, check this box to remove the extension from the file name (eg. remove .csv, .txt) |
Section: Payload Date Settings |
||
This section allows the user to define the field that holds the timestamp of the data. |
||
Field |
Requirement |
Comments |
Date Field Name |
Optional |
Define the name of the date field coming in from your device |
Date Format |
Optional |
Define the format of how the data will be coming through (eg. dd/MM/yyyy HH:mm:ss) |
Activation
Once the node has been configured, click the Activate button and then click Save. The node is ready to process data from AWS S3 bucket.