chore:update readme

This commit is contained in:
rulego-team
2025-04-14 22:53:55 +08:00
parent a480881323
commit 3eed15673e
3 changed files with 121 additions and 17 deletions
+60 -8
View File
@@ -2,14 +2,26 @@
English| [简体中文](README_ZH.md)
**StreamSQL** is a lightweight, embedded stream SQL processing library. It splits unbounded stream data into bounded data chunks using window functions and supports operations such as aggregation, data transformation, and filtering.
**StreamSQL** is a lightweight, SQL-based stream processing engine for IoT edge, enabling efficient data processing and analysis on unbounded streams.
Similar to: [Apache Flink](https://flink.apache.org/) and [ekuiper](https://ekuiper.org/)
## Features
- Supports multiple window types: Sliding Window, Tumbling Window, Count Window
- Supports aggregate functions: MAX, MIN, AVG, SUM, STDDEV, MEDIAN, PERCENTILE, etc.
- Supports group-by aggregation
- Supports filtering conditions
- Lightweight
- Pure in-memory operations
- No dependencies
- Data processing with SQL syntax
- Data analysis
- Built-in multiple window types: sliding window, tumbling window, counting window
- Built-in aggregate functions: MAX, MIN, AVG, SUM, STDDEV, MEDIAN, PERCENTILE, etc.
- Support for group-by aggregation
- Support for filtering conditions
- High extensibility
- Flexible function extension provided
- Integration with the **RuleGo** ecosystem to expand input and output sources using **RuleGo** components
- Integration with [RuleGo](https://gitee.com/rulego/rulego)
- Utilize the rich and flexible input, output, and processing components of **RuleGo** to achieve data source access and integration with third-party systems
## Installation
@@ -17,7 +29,7 @@ English| [简体中文](README_ZH.md)
go get github.com/rulego/streamsql
```
## Usage Example
## Usage
```go
package main
@@ -35,8 +47,8 @@ import (
func main() {
ssql := streamsql.New()
// Define the SQL statement. TumblingWindow is a tumbling window that rolls every 5 seconds
rsql := "SELECT deviceId,avg(temperature) as max_temp,min(humidity) as min_humidity ," +
// Define the SQL statement. Every 5 seconds, group by deviceId and output the average temperature and minimum humidity of the device.
rsql := "SELECT deviceId,avg(temperature) as avg_temp,min(humidity) as min_humidity ," +
"window_start() as start,window_end() as end FROM stream where deviceId!='device3' group by deviceId,TumblingWindow('5s')"
// Create a stream processing task based on the SQL statement.
err := ssql.Execute(rsql)
@@ -91,7 +103,47 @@ func main() {
wg.Wait()
}
```
## Concepts
### Windows
Since stream data is unbounded, it cannot be processed as a whole. Windows provide a mechanism to divide unbounded data into a series of bounded data segments for computation. StreamSQL includes the following types of windows:
- **Sliding Window**
- **Definition**: A time-based window that slides forward at fixed time intervals. For example, it slides every 10 seconds.
- **Characteristics**: The size of the window is fixed, but the starting point of the window is continuously updated over time. It is suitable for real-time statistical analysis of data within continuous time periods.
- **Application Scenario**: In intelligent transportation systems, the vehicle traffic is counted every 10 seconds over the past 1 minute.
- **Tumbling Window**
- **Definition**: A time-based window that does not overlap and is completely independent. For example, a window is generated every 1 minute.
- **Characteristics**: The size of the window is fixed, and the windows do not overlap with each other. It is suitable for overall analysis of data within fixed time periods.
- **Application Scenario**: In smart agriculture monitoring systems, the temperature and humidity of the farmland are counted every hour within that hour.
- **Count Window**
- **Definition**: A window based on the number of data records, where the window size is determined by the number of data records. For example, a window is generated every 100 data records.
- **Characteristics**: The size of the window is not related to time but is divided based on the volume of data. It is suitable for segmenting data based on the amount of data.
- **Application Scenario**: In industrial IoT, an aggregation calculation is performed every time 100 device status data records are processed.
### Stream
- **Definition**: A continuous sequence of data that is generated in an unbounded manner, typically from sensors, log systems, user behaviors, etc.
- **Characteristics**: Stream data is real-time, dynamic, and unbounded, requiring timely processing and analysis.
- **Application Scenario**: Real-time data streams generated by IoT devices, such as temperature sensor data and device status data.
### Time Semantics
- **Event Time**
- **Definition**: The actual time when the data occurred, usually represented by a timestamp generated by the data source.
- **Processing Time**
- **Definition**: The time when the data arrives at the processing system.
- **Window Start Time**
- **Definition**: The starting time point of the window based on event time. For example, for a sliding window based on event time, the window start time is the timestamp of the earliest event within the window.
- **Window End Time**
- **Definition**: The ending time point of the window based on event time. Typically, the window end time is the window start time plus the duration of the window. For example, if the duration of a sliding window is 1 minute, then the window end time is the window start time plus 1 minute.
## Contribution Guidelines
Pull requests and issues are welcome. Please ensure that the code conforms to Go standards and include relevant test cases.
+60 -8
View File
@@ -2,14 +2,26 @@
[English](README.md)| 简体中文
**StreamSQL** 是一轻量级、嵌入式的流式 SQL 处理库。它通过窗口函数将无界数据切分为有界数据块,并支持聚合计算、数据转换和过滤等操作
**StreamSQL** 是一轻量级的、基于 SQL 的物联网边缘流处理引擎。它能够高效地处理和分析无界数据
类似: [Apache Flink](https://flink.apache.org/) 和 [ekuiper](https://ekuiper.org/)
## 功能特性
- 支持多种窗口类型:滑动窗口、滚动窗口、计数窗口
- 支持聚合函数MAX, MIN, AVG, SUM, STDDEV,MEDIAN,PERCENTILE等
- 支持分组聚合
- 支持过滤条件
- 轻量级
- 纯内存操作
- 无依赖
- SQL语法处理数据
- 数据分析
- 内置多种窗口类型:滑动窗口、滚动窗口、计数窗口
- 内置聚合函数MAX, MIN, AVG, SUM, STDDEV,MEDIAN,PERCENTILE等
- 支持分组聚合
- 支持过滤条件
- 高可扩展性
- 提供灵活的函数扩展
- 接入`RuleGo`生态,利用`RuleGo`组件方式扩展输出和输入源
- 与[RuleGo](https://gitee.com/rulego/rulego) 集成
- 利用`RuleGo`丰富灵活的输入、输出、处理等组件,实现数据源接入以及和第三方系统联动
## 安装
@@ -17,7 +29,7 @@
go get github.com/rulego/streamsql
```
## 使用示例
## 使用
```go
package main
@@ -34,8 +46,8 @@ import (
func main() {
ssql := streamsql.New()
// 定义SQL语句。TumblingWindow 滚动窗口5秒滚动一次
rsql := "SELECT deviceId,avg(temperature) as max_temp,min(humidity) as min_humidity ," +
// 定义SQL语句。含义每隔5秒按deviceId分组输出设备的温度平均值和湿度最小值。
rsql := "SELECT deviceId,avg(temperature) as avg_temp,min(humidity) as min_humidity ," +
"window_start() as start,window_end() as end FROM stream where deviceId!='device3' group by deviceId,TumblingWindow('5s')"
// 根据SQL语句创建流式分析任务。
err := ssql.Execute(rsql)
@@ -91,6 +103,46 @@ func main() {
}
```
## 概念
### 窗口
由于流数据是无限的因此不可能将其作为一个整体来处理。窗口提供了一种机制将无界的数据分割成一系列连续的有界数据来计算。StreamSQL 内置以下窗口类型:
- **滑动窗口Sliding Window**
- **定义**:基于时间的窗口,窗口以固定的时间间隔向前滑动。例如,每 10 秒滑动一次。
- **特点**:窗口的大小固定,但窗口的起始点会随着时间推移而不断更新。适合对连续时间段内的数据进行实时统计分析。
- **应用场景**:在智能交通系统中,每 10 秒统计一次过去 1 分钟内的车辆流量。
- **滚动窗口Tumbling Window**
- **定义**:基于时间的窗口,窗口之间没有重叠,完全独立。例如,每 1 分钟生成一个窗口。
- **特点**:窗口的大小固定,且窗口之间互不重叠,适合对固定时间段内的数据进行整体分析。
- **应用场景**:在智能农业监控系统中,每小时统计一次该小时内农田的温度和湿度。
- **计数窗口Count Window**
- **定义**:基于数据条数的窗口,窗口大小由数据条数决定。例如,每 100 条数据生成一个窗口。
- **特点**:窗口的大小与时间无关,而是根据数据量来划分,适合对数据量进行分段处理。
- **应用场景**:在工业物联网中,每处理 100 条设备状态数据后进行一次聚合计算。
### 流Stream
- **定义**:流是数据的连续序列,数据以无界的方式产生,通常来自于传感器、日志系统、用户行为等。
- **特点**:流数据具有实时性、动态性和无限性,需要及时处理和分析。
- **应用场景**:物联网设备产生的实时数据流,如温度传感器数据、设备状态数据等。
### 时间语义
- **事件时间Event Time**
- **定义**:数据实际发生的时间,通常由数据源生成的时间戳表示。
- **处理时间Processing Time**
- **定义**:数据到达处理系统的时间。
- **窗口开始时间Window Start Time**
- **定义**:基于事件时间,窗口的起始时间点。例如,对于一个基于事件时间的滑动窗口,窗口开始时间是窗口内最早事件的时间戳。
- **窗口结束时间Window End Time**
- **定义**:基于事件时间,窗口的结束时间点。通常窗口结束时间是窗口开始时间加上窗口的持续时间。
- 例如,一个滑动窗口的持续时间为 1 分钟,则窗口结束时间是窗口开始时间加上 1 分钟。
## 贡献指南
欢迎提交PR和Issue。请确保代码符合Go标准并添加相应的测试用例。
+1 -1
View File
@@ -16,7 +16,7 @@ import (
func TestStreamData(t *testing.T) {
ssql := New()
// 定义SQL语句。TumblingWindow 滚动窗口5秒滚动一次
rsql := "SELECT deviceId,avg(temperature) as max_temp,min(humidity) as min_humidity ," +
rsql := "SELECT deviceId,avg(temperature) as avg_temp,min(humidity) as min_humidity ," +
"window_start() as start,window_end() as end FROM stream where deviceId!='device3' group by deviceId,TumblingWindow('5s')"
// 根据SQL语句创建流式分析任务。
err := ssql.Execute(rsql)