鼎鼎知识库
Nelze vybrat více než 25 témat Téma musí začínat písmenem nebo číslem, může obsahovat pomlčky („-“) a může být dlouhé až 35 znaků.

11.InfluxDB的基本使用.md 22KB

基本概念

InfluxDB基于行协议(line protocol),一个行代表这个point的数据。

weather,location=us-midwest temperature=82 1465839830100400200

以上代表着:
measurement,tag_set field_set timestamp

weather就是measurement
location=us-midwest就是tag_set, 是一组键值对
temperature就是field_set,是一组键值对
1465839830100400200就是timestamp,即时间戳(016-06-13T17:43:50.1004002Z)

注意:
--measurement和field_set以及field_set和timestamp之间都有一个空格
--timestamp是Unix型纳秒级,如果不填,会默认使用服务器的纳秒级UTC时间戳.当使用服务器集群的时候,这些服务器集群的时间必须同步,否则会造成数据的不准确

举例:
--weather,location=us-midwest,season=summer temperature=82 1465839830100400200
--weather,location=us-midwest temperature=82,humidity=71 1465839830100400200

数据类型

在tag_set中,tag的值是string类型,InfluxDB不能基于tag的string类型值进行运算,即不能把tag的值作为InfluxQL函数的参数

时间戳,timestamp是UNIX类型,最小时间戳-9223372036854775806,即1677-09-21T00:12:43.145224194Z。最大时间戳9223372036854775806,即2262-04-11T23:47:16.854775806Z。默认情况下时间戳的精度是纳秒,可以通过API更换时间戳的精度。

Field值类型可以是float,integer, string, boolean。
--weather,location=us-midwest temperature=82 1465839830100400200这里的82会被看作是float类型
--weather,location=us-midwest temperature=82i 1465839830100400200这里的82会被看作是integer类型
--weather,location=us-midwest temperature="too warm" 1465839830100400200这里的too warm会被看作是string类型
--weather,location=us-midwest too_hot=true 1465839830100400200,这里的true就是boolean类型,表示true的可以是t,T, true, True, TRUE,表示false的可以是f,F, false, False, FALSE

在同一个分片shard中存储不同类型的field值会报错:
--INSERT weather,location=us-midwest temperature=82 1465839830100400200
--INSERT weather,location=us-midwest temperature=82i 1465839830100400300
ERR:{"error":"field type conflict:input field\"temperature\" on measuremetn \"weather\" is type int64}

但是在不同的分片Shard中存储不同类型的field值不会报错:
--INSERT weather,location=us-midwest temperature=82 1465839830100400200
--INSERT weather,location=us-midwest temperature=82i 1465839830100400300

引号

不要在时间戳上加双引号:
--INSERT weather,location=us-midwest temperature=82 "1465839830100400200"
ERR: {"error":"unable to parse 'weather,location=us-midwest temperature=82 \"1465839830100400200\"': bad timestamp"}

不要在字段field值上加单引号:
--INSERT weather,location=us-midwest temperature='too warm'
ERR: {"error":"unable to parse 'weather,location=us-midwest temperature='too warm'': invalid boolean"}

不要在tag的key,value,field的key上加单引号或双引号,这样虽然不会报错,但InfluxDB会把引号看作是measruements的一部分:
--INSERT "weather",location=us-midwest temperature=87 1465839830100400200
--SHOW MEASURMENTS
--会列出"weather"
--这样查询起来会麻烦:SELECT * FROM "\"weather\""

不要在filed值上加双引号,InfluxDB会看作是字符串类型:
--INSERT weather,location=us-midwest temperatrue="82"

特殊字符Special Characters

,通过\转义:
weather,location=us\,midwest temperature=82 1465839830100400200

=通过\转义:
weather,location=us-midwest temp\=rature=82 1465839830100400200

空格通过\转义:
weather,location\ place=us-midwest temperature=82 1465839830100400200

measurement中的,通过\转义:
wea\,ther,lication=us-midwest temperature=82 1465839830100400200

measurement中的空格通过\转义:
wea\ ther,location=us-midwest temperature=82 1465839830100400200

字段filed值中的双引号用\转义:
weather,location=us-midwest temperature="too\"hot\"" 1465839830100400200

/或\的表现:
--weather,location=us-midwest temperature_str="too hot/cold" 1465839830100400201
--weather,location=us-midwest temperature_str="too hot\cold" 1465839830100400202
--weather,location=us-midwest temperature_str="too hot\\cold" 1465839830100400203
--weather,location=us-midwest temperature_str="too hot\\\cold" 1465839830100400204
--weather,location=us-midwest temperature_str="too hot\\\\\cold" 1465839830100400205
--weather,location=us-midwest temperature_str="too hot\\\\\cold" 1465839830100400206

> SELECT * FROM "wather"
name:weather
time                location        temperature_str
1465839830100400201 us-midwest      too hot/cold
1465839830100400202 us-midwest      too hot\cold
1465839830100400203 us-midwest      too hot\cold  两个会去掉一个
1465839830100400204 us-midwest      too hot\\cold 三个去掉一个
1465839830100400205 us-midwest      too hot\\cold 四个去掉两个
1465839830100400206 us-midwest      too hot\\\cold 5个去掉两个

关键字Keywords

time可以是database, measurement, retension plocy, subscription, user的名称,time不能作为tag或field的key

聚合aggregation

InfluxQL函数,对一组数据进行计算。


==COUNT()
> SELECT COUNT("water_level") FROM "h2o_feet"
返回h2o_feet"这个measurement中water_level这个字段field值不为空的数量

> SELECT COUNT(*) FROM "h2o_feet"
返回h2o_feet"这个measurement中所有字段字段field值不为空的数量

> SELECT COUNT(/water/) FROM "h2o_feet"
返回h2o_feet"这个measurement中字段包含water并且值不为空的数量

> SELECT COUNT("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:54:00Z' GROUP BY time(12m),* fill(200) LIMIT 7 SLIMIT 1
时间范围,12分钟的时间间隔进行分组,没有值的用200填充,数据点个数最多为7,序列个数最多为1

InfluxQL-基本

连接和退出数据库

$ .\influx -precision rfc3339
Connected to http://localhost:8086 version1.7.7
InfluxDB shell version:1.7.1

rfc3339的时间戳格式是:YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ

$ exit

创建数据库

  • 运行influxd.exe文件
  • 启动influx: ./influx -precision rfc3339
  • 创建数据库 $ CREATE DATABASE NOAA_water_database

下载测试数据并写入本地数据库

下载数据:
$ curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt
这样在目录中多了一个NOAA_data.txt文件

导入本地数据库:
$ ./influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database
这时会报错:unknown arguments: .txt -precision=s
在`influx.exe`文件所在目录,把`NOAA_data.txt`改成`NOAA_data`
$ ./influx -import -path=NOAA_data -precision=s -database=NOAA_water_database

连接数据库:
$ ./influx -precsion rfc3339 -database NOAA_water_database

查询所有的表,即measument:
$ SHOW measurements

InfluxQL-Data exploration

查询

统计某个非空值字段的数量

SELECT COUNT("water_level") FROM h2o_feet

选择前几个

SELECT * FROM h2o_feet LIMIT 5

查询所有fields和tags

 SELECT * FROM "h2o_feet"

选择特定的tag和field

$ ./influx -precsion rfc3339
$ USE NOAA_water_database
$ SELECT "level description","location","water_level" FROM "h2o_feet"

选择tag和field,用类型区分

SELECT "level description"::field,"location"::tag,"water_level"::field FROM "h2o_feet"

选择所有的field

SELECT *::field FROM "h2o_feet"

field简单计算

SELECT ("water_level" * 2) + 4 from "h2o_feet"

从多个measurements中查询数据

select * from "h2o_feet","h2o_PH"

从多个measurements中查询数据,用上数据库名

select * from "NOAA_water_database"."autogen"."h2o_feet"

查询某个数据库中某个measuremnt的所有数据

select * from "NOAA_water_database".."h2o_feet"

查询与tag相关的数据必须至少带一个field

select "water_level","location" from "h2o_feet"

过滤

Where语句语法

field支持的操作符:
field_key <operator> ['string' | boolean | float | integer]
= <> != > >= < <=

tag支持的操作符:
tag_key <operator> ['tag_value']
= <> !=

根据字段值筛选

select * from "h2o_feet" where "water_level">8

根据某个字段的字符串值筛选

select * from "h2o_feet" where "level description" = 'below 3 feet'

根据某个计算筛选

select * from "h2o_feet" where "water_level" + 2 > 11.9

根据某个tag值筛选

select "water_level" from "h2o_feet" wehre "location" = 'santa_monica'

根据tag和field筛选

select "water_level" from "h2o_feet" where "location" <> 'santa_monica' adn (water_level < -0.59 OR water_level > 9.95)

根据timestamp筛选

select * from h2o_feet wehre time > now() -7d

分组

根据tag分组

 select MEAN(water_level) from h2o_feet group by location
 根据location分组后,取每个分组中water_level字段的平均值

根据多个tag分组

select MEAN(index) from h2o_feet group by lcoation,randtag

根据所有tag分组

select MEAN(index) from h2o_feet group by *

根据时间间隔分组

SELECT COUNT("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m)

根据时间间隔和tag分组

SELECT COUNT("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),"location"

根据时间间隔分组并移前

SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-08-18T00:06:00Z' AND time <= '2015-08-18T00:54:00Z' GROUP BY time(18m,6m)

groupby和fill的结合

> SELECT MAX("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-09-18T16:00:00Z' AND time <= '2015-09-18T16:42:00Z' GROUP BY time(12m) fill(100)

INTO

在原来数据库基础上复制出一个新的数据库

重命名一个数据库是不可能的,只能在原来数据库基础上创建一个新的数据库,用INTO语法。

SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM "NOAA_water_database"."autogen"./.*/ GROUP BY *

:MEASUREMENT表示原先数据库的measuments都复制到新的数据库。

autogen是数据保留策略,原先数据库和新的数据库都必须有,否则INTO语句无法执行。

GROUP BY *很关键,意思是把NOAA_water_database数据库中所有measuments下的所有tag也复制到copy_NOAA_water_database数据库。如果不这样写,原先数据库中measuments下的tag会变成copy_NOAA_water_database下的字段。

具体步骤:
--创建新的数据库:create database copy_NOAA_water_database
--进入源数据库:use NOAA_water_database
--使用INTO语句复制数据: SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM "NOAA_water_database"."autogen"./.*/ GROUP BY *
--进入新数据库:use copy_NOAA_water_database
--查询新数据库的所有measurments:show measurements
--查询新数据库是否有数据:select * from h2o_feet LIMIT 5

如果数据量很大,建议按measuement和时间范围,循序渐进地复制

SELECT *
INTO <destination_database>.<retention_policy_name>.<measurement_name>
FROM <source_database>.<retention_policy_name>.<measurement_name>
WHERE time > now() - 100w and time < now() - 90w GROUP BY *

SELECT *
INTO <destination_database>.<retention_policy_name>.<measurement_name>
FROM <source_database>.<retention_policy_name>.<measurement_name>}
WHERE time > now() - 90w  and time < now() - 80w GROUP BY *

SELECT *
INTO <destination_database>.<retention_policy_name>.<measurement_name>
FROM <source_database>.<retention_policy_name>.<measurement_name>
WHERE time > now() - 80w  and time < now() - 70w GROUP BY *

把查询结果复制到measument中去

SELECT "water_level" INTO "h2o_feet_copy_1" FROM "h2o_feet" WHERE "location" = 'coyote_creek'

排序

根据时间降序:
SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' ORDER BY time DESC

分组排序:
SELECT MEAN("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:42:00Z' GROUP BY time(12m) ORDER BY time DESC

LIMIT和SLIMIT

限制point返回数量:
SELECT "water_level","location" FROM "h2o_feet" LIMIT 3

限制series返回数量:
SELECT "water_level" FROM "h2o_feet" GROUP BY * SLIMIT 1

OFFSET SOFFSET

显示point的第4,5,6条数据
SELECT "water_level","location" FROM "h2o_feet" LIMIT 3 OFFSET 3

显示point的第1,2,3条数据
SELECT "water_level","location" FROM "h2o_feet" LIMIT 3 

SELECT MEAN("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:42:00Z' GROUP BY *,time(12m) ORDER BY time DESC LIMIT 2 OFFSET 2 SLIMIT 1

显示serie的第2条数据
SELECT "water_level" FROM "h2o_feet" GROUP BY * SLIMIT 1 SOFFSET 1

Time Zone

选择时区基准
SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:18:00Z' tz('America/Chicago')

SELECT语句即使没有选择时间范围,也有默认时间范围:
1677-09-21 00:12:43.145224194 and 2262-04-11T23:47:16.854775806Z

GROUP BY time()的时间范围是从过去到现在:
1677-09-21 00:12:43.145224194到现在

使用RFC3339的时间类型字符串:
SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00.000000000Z' AND time <= '2015-08-18T00:12:00Z'

使用RFC3339-like的时间类型字符串:
SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18' AND time <= '2015-08-18 00:12:00'

使用epoch时间戳:
 SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= 1439856000000000000 AND time <= 1439856720000000000

 使用second-precision epoch时间戳:
 SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= 1439856000s AND time <= 1439856720s

 在RFC3339-like的时间类型字符串上运行计算:
 SELECT "water_level" FROM "h2o_feet" WHERE time > '2015-09-18T21:24:00Z' + 6m

在epoch时间戳上运行计算:
SELECT "water_level" FROM "h2o_feet" WHERE time > 24043524m - 6m

相对时间

仅仅相对时间:
SELECT "water_level" FROM "h2o_feet" WHERE time > now() - 1h

相对时间和绝对时间结合:
SELECT "level description" FROM "h2o_feet" WHERE time > '2015-09-18T21:18:00Z' AND time < now() + 1000d

正则表达式

选择tag或field中包含1:
SELECT /l/ FROM "h2o_feet" LIMIT 1

选择所有包含temperature的measurment中的degrees的平均值
SELECT MEAN("degrees") FROM /temperature/

location这个tag包含m, water_level这个field大于3:
SELECT MEAN(water_level) FROM "h2o_feet" WHERE "location" =~ /[m]/ AND "water_level" > 3

location这个tag没有值:
SELECT * FROM "h2o_feet" WHERE "location" !~ /./

location这个tag有值:
SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" =~ /./

level description这个字段的值包含between
SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND "level description" =~ /between/

分组时使用正则表达式:
SELECT FIRST("index") FROM "h2o_quality" GROUP BY /l/

数据类型

返回water_level这个字段的类型是float:
SELECT "water_level"::float FROM "h2o_feet" LIMIT 4

数据类型转换

把water_level的float类型的值转换成integer:
SELECT "water_level"::integer FROM "h2o_feet" LIMIT 4

把water_level的float类型的值转换成string(不支持):
SELECT "water_level"::string FROM "h2o_feet" LIMIT 4

合并行为

默认把两个serie自动合并:
SELECT MEAN("water_level") FROM "h2o_feet"

避免自动合并:
SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'coyote_creek'

分别得到两个serie的数据:
SELECT MEAN("water_level") FROM "h2o_feet" GROUP BY "location"

多条语句

SELECT MEAN("water_level") FROM "h2o_feet"; SELECT "water_level" FROM "h2o_feet" LIMIT 2

子语句

SELECT SUM("max") FROM (SELECT MAX("water_level") FROM "h2o_feet" GROUP BY "location")

InfluxQL-Schema exploration

展示所有数据库:
SHOW DATABASES

展示数据库的数据保留策略:
SHOW RETENTION POLICIES ON NOAA_water_database

展示某个数据库的所有时间序列:
SHOW SERIES ON NOAA_water_database

展示某个数据库某个表符合条件的时间序列:
SHOW SERIES ON NOAA_water_database FROM "h2o_quality" WHERE "location" = 'coyote_creek' LIMIT 2

展示某个数据库的所有表:
SHOW MEASUREMENTS ON NOAA_water_database

展示某个数据库某个以h2o开头的表,randtag这个tag的值包含整型:
SHOW MEASUREMENTS ON NOAA_water_database WITH MEASUREMENT =~ /h2o.*/ WHERE "randtag"  =~ /\d/

展示某个数据库的所有tag的key:
SHOW TAG KEYS ON "NOAA_water_database"

展示TAG的值:
SHOW TAG VALUES ON "NOAA_water_database" WITH KEY = "randtag"

展示数据库字段的key:
SHOW FIELD KEYS ON "NOAA_water_database"

InfluxQL-Data management

创建数据库使用默认配置:
CREATE DATABASE "NOAA_water_database"

创建数据库自定义配置:
CREATE DATABASE "NOAA_water_database" WITH DURATION 3d REPLICATION 1 SHARD DURATION 1H NAME "liquid"

删除数据库:
DROP DATABASE "NOAA_water_database"

删除表中的时间序列:
DROP SERIES FROM "h2o_feet"

根据tag值删除时间序列:
DROP SERIES FROM "h2o_feet" WHERE "location" = 'santa_monica'

删除所有表记录:
DELETE FROM "h2o_feet"

带条件的删除:
DELETE FROM "h2o_quality" WHERE "randtag" = '3'
DELETE WHERE "h2o_quality" WHERE time < '2016-01-01'

删除表:
DROP MEASUREMENT "h2o_feet"

删除shard:
DROOP SHARD 1

数据保留策略:DURATION最小1个小时,最大INF表示无穷;REPLICATION,决定了每个point在集群中有几份,默认是3份,为了确保数据及时响应给请求,这里的值最好小于等于集群中的数据节点。在单结点实例中REPLICATION的设置无效;SHARD DURATION设置Shard Group的时间范围,这里的值没有无线INF一说。默认情况下SHARD DURATION的值受RETENTION POLICY影响。SHARD DURATION的默认值是1小时。
--CREATE RETENTION POLICY "one_day_only" ON "NOAA_water_database" DURATION 1d REPLICATION 1
--把新的策略设置成默认策略:CREATE RETENTION POLICY "one_day_only" ON "NOAA_water_database" DURATION 23h60m REPLICATION 1 DEFAULT

创建并修改策略:
--创建策略:CREATE RETENTION PPLICY "what_is_time" ON "NOAA_water_database" DURATION 2d REPLICATION 1
--修改策略:ALTER RETENTION POLICY "what_is_time" ON "NOAA_water_database" DURAITON 3w SHARD DURATION 2H DEFAULT

删除策略:
DROP RETENTION POLICY "what_is_time" ON "NOAA_water_database"

InfluxQL-Continuous Queries

自动或间隔运行并且保存在measurement中。

自动统计数据

CREATE CONTINUOUS QUERY "cq_basic" ON "transporation"
BEGIN
    SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

cq_basic是自动运行的query的名称,每小时从bus_data这个measurment中统计出来的数据保存到trasporation数据库中的average_passengers这个measurement中。

select * from "average_passengers"

自动统计数据,并保存到不同的RETENTION POLICY上

CREATE CONTINUOUS QUERY "cq_basic_rp" ON "transporation"
BEGIN
    SELECT mean("passengers") INTO "transporation"."three_weeks"."average_passengers" FROM "bus_data" GROUP BY time(1h)

SELECT * FROM "transporation"."three_weeks"."average_passengers"

自动统计数据,保存到不同的数据库

CREATE CONTINUOUS QUERY "cq_basic_br" ON "transporation"
BEGIN
    SELECT mean(*) INTO "downsampled_trasporation"."autogen".:MEASUREMENT FROM /.*/ GROUP BY time(30m)
END

自动统计数据,延迟保存到另外的表

CREATE CONTINUOUS QUERY "cq_basic_offset" ON "transporation"
BEGIN
    SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h,15m)

自动统计数据,每隔1小时统计一次,然后每30分钟统计一次,即半点的时候统计一次,最终半点的数据会被下一个整点的数据替换掉。

CREATE CONTINUOUS QUERY "cq_advanced_every" ON "transportation"
RESAMPLE EVERY 30m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h)
END

自动统计数据,每30分钟统计一次数据,统计前1个小时的数据。

CREATE CONTINUOUS QUERY "cq_advanced_for" ON "transportation"
RESAMPLE FOR 1h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

自动统计for和every结合起来:

CREATE CONTINUOUS QUERY "cq_advanced_every_for" ON "transportation"
RESAMPLE EVERY 1h FOR 90m
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m)
END

自动统计,填上空值

CREATE CONTINUOUS QUERY "cq_advanced_for_fill" ON "transportation"
RESAMPLE FOR 2h
BEGIN
  SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) fill(1000)
END

展示所有Continuous Query

SHOW CONTINUOUS QUERIES

删除Continius Query

DROP CONTINOUS QUERY "idle_hands" ON ""

InfluxQL-Functions

  • COUNT()
  • DISTNCT()
  • INTEGRAL()
  • MEAN()
  • MEDIAN()排好序的中位数
  • MODE()字段值中出现频率最高的值
  • SPREAD()字段值最大最小之差
  • STDDEV()字段值标准差
  • SUM()
  • BOTTOM()
  • FIRST()
  • LAST()
  • MAX()
  • MIN()
  • PERCENTILE()字段值某个百分位上的值
  • SAMPLE()随机样本
  • TOP()
  • ABS()
  • ACOS()
  • ASIN()
  • ATAN()
  • ATAN2()
  • CEL()
  • COS()
  • CUMULATIVE_SUM()
  • DERIVATIVE()变化率
  • DIFFERENCE()差值
  • ELAPSED()时间戳差值
  • EXP()指数
  • FLOOR()
  • LN()自然对数
  • LOG()
  • LOG2()
  • LOG10()
  • MOVING_AVERAGE()滚动窗口的平均值
  • NON_NEGATIVE_DERIVATIVE()非负变换率
  • NON_NEGATIVE_DIFFERENCE()非负差值
  • POW()
  • ROUND()
  • SIN()
  • SQRT()
  • TAN()

InfluxQL-Mathematical operations

加法:
SELECT "A" + 5 FROM "add"

减法:
SELECT "A" - "B" from ""

乘法:
SELECT "A" * "B" * "C" from ""

除法:
SELECT 10 / "A" FROM ""

取余:
SELECT "B" % 2 FROM ""