1. Home
  2. 15: How-to Guides

How to Merge Data from Multiple Sources in One Workflow

Combine multiple data inputs into a single, unified workflow for integrated processing and analysis.

Overview

In Rayven, you can merge data from multiple sources—such as APIs, SQL databases, files, IoT feeds, or user input forms—into a single workflow for consolidated processing, transformation, and output. This approach enables you to:

  • Correlate related datasets in real time.

  • Enrich one dataset with fields from another.

  • Standardise different formats into a consistent structure.

  • Drive unified interface visualisations and reports.

Merging data can be done by connecting the right input nodes, using logic nodes to transform or combine the data, and adding table lookups if you need to align identifiers and formats.


Planning the Merge

In Rayven, “merging data” can mean different things depending on your use case:

  • Real-time workflows: Multiple data sources are ingested and processed together against the same UID in the Primary Table. This ensures that all relevant data for that entity (e.g., asset, customer, site) is consolidated for live processing, calculation, and storage in the Cassandra time-series database.

  • Non-real-time workflows: Data may be collected from different sources, normalised, and written into the appropriate structured table. This could be:

    • Primary Table — storing core attributes against the UID for that entity.

    • Secondary Table — storing related records against a key column that links back to a Primary Table UID or another unique identifier.

The approach you take depends on:

  • The timing of your sources (live vs. batch).

  • The storage target (workflow data in Cassandra, structured tables, or both).

  • How the data will be linked (matching on UID, label values, or other identifiers).

Merging in Rayven is often about bringing together different streams or datasets so they can be processed as a unified payload—whether for real-time visualisation, automated logic, or structured record-keeping.


Steps to Merge Multiple Data Sources in One Workflow

1. Add Input Nodes for Each Source

Add the connectors or triggers that will bring in your data. These might be:

  • Connectors (e.g., API, SQL, HTTP, MQTT) for external systems.

  • Trigger nodes for scheduled or event-based execution.

If the sources run in parallel (e.g., MQTT streaming and periodic HTTP fetch), place them separately and merge later. If one must run after the other, connect them in sequence.

2. Align the Data Structure

Before the data can be meaningfully merged, standardise it into a consistent structure.

  • Use Advanced Function nodes to rename keys, adjust data types, or map values to a common schema.

  • Use JavaScript nodes for more flexible manipulation—combining fields, restructuring payloads, or appending contextual data from one source to another.

  • If only certain records are needed, use Conditional Filter nodes to remove irrelevant data.

3. Merge the Data Streams

Merging can be done in different ways depending on your goal:

A. For real-time workflows (Cassandra storage and processing)

  • Use JavaScript and/or Combine Data nodes to bring together multiple payloads into a single JSON object before associating it with a UID.

  • If the payload isn’t already linked to the correct UID, you can use Associate Payload with Correct UID to match it to the right Primary Table record for storage and processing.

  • Combining first ensures both datasets have matching timestamps and are processed as one event, which is useful when timing and correlation matter.


B. For structured table updates (Primary or Secondary Tables)

  • You don’t always need to combine payloads first. You can update the same table row in separate workflow branches as long as you use the same UID or key column.

  • This allows you to update different columns as the relevant data arrives, without waiting for both sources.

  • Combining first is optional, but can be done if you want to reduce node executions or ensure that changes to multiple columns happen at the same time.

4. Perform Post-Merge Processing

After combining, you may want to:

  • Apply calculations using Formula or Aggregation nodes.

  • Add business logic with Rule Builder.

  • Validate merged data before writing to storage.

5. Store or Output the Merged Data

Decide where the merged dataset should go:

  • Update Tables node to save into a Primary or Secondary Table.

  • Frontend nodes to send directly to interfaces.

  • Output connectors (e.g., Output to API, Output to Email) to send data to external systems.

6. (Optional) Handle Asynchronous Sources

When data from different sources arrives at different times, you have a few options:

  • Combine Data node — Can be configured to merge payloads even when they arrive at different times, using methods such as Combine Latest Received, Combine With Same Timestamp, or Combine With Latest Timestamp. This ensures that when both datasets are available, they are output together as a single payload.

  • Queue node — Not for combining data, but for regulating the rate at which payloads pass through the workflow. Useful if many individual payloads are received in a short timeframe and you want to avoid overwhelming downstream logic or connectors.

  • Other Conditional logic — Can be used to merge or process data only when all required inputs are present in the workflow.


Best Practices

  • Match your merge method to your outcome – For real-time workflows, merging into a single payload associated with a UID before processing ensures data is stored together in Cassandra and timestamped as one event. For structured table updates, you can update rows incrementally without combining payloads first.

  • Use JavaScript for flexibility – JavaScript nodes give you fine control over mapping, restructuring, and combining payloads, and can be used alone or alongside Combine Data.

  • Leverage Combine Data for async handling – If your inputs arrive at different times, the Combine Data node can align them using methods such as Combine Latest Received or Combine With Same Timestamp.

  • Keep identifiers consistent – Ensure all sources have the correct UID or key column value before merging or updating tables. Use Associate Payload with Correct UID if needed to align the payload with the right Primary Table record.

  • Avoid unnecessary combinations – Don’t merge payloads unless it serves a purpose, such as correlation accuracy, efficiency, or processing logic. Unnecessary merging can add complexity without benefit.

  • Inspect at key points – Use the Inspect Data tab in nodes to verify field names, formats, and values before and after merging, so you catch issues early.

  • Filter before you merge – Remove irrelevant or noisy data with Conditional Filter or logic in JavaScript nodes before combining, to keep payloads lean and meaningful.

  • Consider performance impact – Large merged payloads may increase processing time. If only some fields are needed downstream, strip out unused values early.

 


Summary

Merging multiple data sources in a single Rayven workflow allows you to unify datasets for richer insights and streamlined processing. By combining connectors, transformation nodes, and merge logic, you can create a single flow that ingests, standardises, and outputs data in the desired format—all without leaving the Workflow Builder.


FAQs

Can I merge more than two data sources?
Yes. You can bring together as many sources as needed, using JavaScript and/or Combine Data nodes. You can chain multiple merges or enrich payloads with Query Tables.

Do I always need to combine data before storing it?
No. For structured table updates, you can update the same row in separate branches without combining payloads first. Combining is more important in real-time workflows when you want all data stored together in Cassandra as one event.

When should I use Associate Payload with Correct UID?
Only when the payload is not already linked to the correct UID. If your incoming data already includes the right UID or key column value, this step isn’t necessary.

What’s the difference between using JavaScript and Combine Data to merge?

  • JavaScript gives you maximum control—ideal for mapping, restructuring, or conditional merging.

  • Combine Data is designed for joining payloads automatically, including aligning asynchronous inputs using methods like Combine Latest Received.
    Often, they’re used together: JavaScript to prepare the data, Combine Data to merge it.

Can I merge live and historical data?
Yes. You can merge live data from connectors with historical data fetched from a table using Query Tables, then combine them in JavaScript or Combine Data before output or storage.