1 Preface

What is Spring Cloud Data Flow, although it has been around for a while, I think many people don’t know about it, because few people use it in their projects. Not much information can be found online.

Spring Cloud Data Flow is a microservices-based, specialized framework for streaming and batch data processing.

2 Basic concepts

2.1 Data Processing Modes

There are two modes of data processing, Streaming and Batch processing. Streaming is long time always, I process when your data comes, I wait when it doesn’t, based on message driven. Batch is a shorter processing time, start once and process once, then exit the task, need to go to trigger the task.

Generally, we develop Streaming applications based on the Spring Cloud Stream framework and Batch applications based on the Spring Cloud Task or Spring Batch framework. Once the development is done, it can be packaged into two forms.

  • Springboot-style jar packages that can be placed on maven repositories, file directories, or HTTP services.
  • Docker images.

For Stream, there are three concepts that need to be understood.

  • Source : the message producer, responsible for sending messages to a certain target.
  • Sink : a message consumer, responsible for reading messages from some target.
  • Processor : the union of Source and Sink, which consumes messages from a target and sends them to another target.

2.2 Features

Spring Cloud Data Flow has many good features to learn how to use it.

  • A cloud-based architecture that can be deployed on Cloud Foundry, Kubernetes or OpenShift, etc.
  • A number of optional out-of-the-box stream processing and batch application components.
  • Customizable application components based on a Springboot style programming model.
  • Simple and flexible DSL (Domain Specific Language) to define task processing logic.
  • Beautiful Dashboard to visually define processing logic, manage applications, manage tasks, etc.
  • Provides REST API to interact in shell command line mode.

2.3 Server-side components

The server side has two important components: Data Flow Server and Skipper Server. The two have different roles and collaborate with each other.

Data Flow Server and Skipper Server

The main roles of the Data Flow Server are

  • Parsing the DSL.
  • Verifying and persisting the definitions of Stream, Task and Batch.
  • Registering applications such as jar package applications and docker applications.
  • Deploying Batch to one or more platforms.
  • Querying the execution history of Jobs and Batches.
  • Configuration management of Stream.
  • Distribution of Stream deployments to Skipper.

The main roles of the Skipper Server are.

  • Deploying Stream to one or more platforms.
  • Updating or rolling back Streams based on a grayed-out/greened-out update policy.
  • Save the description information of each Stream.

As you can see, if you don’t need to use Stream, you can deploy Skipper without it. Both rely on a relational database (RDBMS) and by default use the built-in H2, supported by H2, HSQLDB, MYSQL, Oracle, PostgreSql, DB2 and SqlServer.

2.4 Runtime environment

The good Spring is always particularly decoupled, and the Server and application can run on different platforms. We can deploy Data Flow Server and Skipper Server on Local, Cloud Foundry and Kuernetes, and Server can deploy the application on different platforms.

  • Server-side Local: application Local/Cloud Foundry/Kuernetes.
  • Server-side Cloud Foundry: applying Cloud Foundry/Kuernetes.
  • Server-side Kuernetes: Apply Cloud Foundry/Kuernetes.

In general, we deploy Server and application on the same platform. For production environments, it is recommended that it is more appropriate to be on Kuernetes.

3 Local mode installation and use

For a quick experience, we use the simplest local runtime environment.

3.1 Downloading Jar Packages

Download the following three jar packages.

1
2
3
wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server/2.5.3.RELEASE/spring-cloud-dataflow-server-2.5.3.RELEASE.jar
wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/2.5.3.RELEASE/spring-cloud-dataflow-shell-2.5.3.RELEASE.jar
wget https://repo.spring.io/release/org/springframework/cloud/spring-cloud-skipper-server/2.4.3.RELEASE/spring-cloud-skipper-server-2.4.3.RELEASE.jar

For a simple Batch application, you can just download spring-cloud-dataflow-server-2.5.3.RELEASE.jar.

3.2 Starting the application

1
2
3
4
# 启动Skipper,默认端口为7577
java -jar spring-cloud-skipper-server-2.4.3.RELEASE.jar
# 启动Data Flow Server,默认端口为9393
java -jar spring-cloud-dataflow-server-2.5.3.RELEASE.jar

After the launch is complete, open your browser and visit http://localhost:9393/dashboard to see the UI interface.

UI

3.3 Deploying applications

3.3.1 Adding ApplicationsApplications

You can only deploy Batch and Stream if you have added applications. The official example Applications is provided and we can use it directly.

official example

After successful addition, you can view in the application list.

application list

3.3.2 Creating a Task

Creating a Task can be done graphically or through the DSL, which is very convenient.

Creating a Task

After defining the Task, enter the name to create.

Task

3.3.3 Running a Task

Run directly by clicking on.

Running a Task

Parameters can be passed in.

Parameters

3.3.4 Viewing Task Runs

Viewing Task Runs Viewing Task Runs

You can view the run log.

run log

3.4 Data Flow Shell Command Line(CLI)

In addition to the web page, you can also interact with Server via command line mode.

To start the application.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ java -jar spring-cloud-dataflow-shell-2.5.3.RELEASE.jar 
  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/

2.5.3.RELEASE

Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
Successfully targeted http://localhost:9393/

dataflow:>app list
╔═══╤══════╤═════════╤════╤════════════════════╗
║app│source│processor│sink│        task        ║
╠═══╪══════╪═════════╪════╪════════════════════╣
║   │      │         │    │composed-task-runner║
║   │      │         │    │timestamp-batch     ║
║   │      │         │    │timestamp           ║
╚═══╧══════╧═════════╧════╧════════════════════╝

dataflow:>

4 Summary

This article uses the official application, we can develop our own application and register it on Server. The Local model is suitable for development environments, but production environments are more reliable when deployed on Kubernetes. Let’s explore it later.

Reference https://segmentfault.com/a/1190000023465341