# 基本概念 InfluxDB基于行协议(line protocol),一个行代表这个point的数据。 ``` weather,location=us-midwest temperature=82 1465839830100400200 以上代表着: measurement,tag_set field_set timestamp weather就是measurement location=us-midwest就是tag_set, 是一组键值对 temperature就是field_set,是一组键值对 1465839830100400200就是timestamp,即时间戳(016-06-13T17:43:50.1004002Z) 注意: --measurement和field_set以及field_set和timestamp之间都有一个空格 --timestamp是Unix型纳秒级,如果不填,会默认使用服务器的纳秒级UTC时间戳.当使用服务器集群的时候,这些服务器集群的时间必须同步,否则会造成数据的不准确 举例: --weather,location=us-midwest,season=summer temperature=82 1465839830100400200 --weather,location=us-midwest temperature=82,humidity=71 1465839830100400200 ``` **数据类型** ``` 在tag_set中,tag的值是string类型,InfluxDB不能基于tag的string类型值进行运算,即不能把tag的值作为InfluxQL函数的参数 时间戳,timestamp是UNIX类型,最小时间戳-9223372036854775806,即1677-09-21T00:12:43.145224194Z。最大时间戳9223372036854775806,即2262-04-11T23:47:16.854775806Z。默认情况下时间戳的精度是纳秒,可以通过API更换时间戳的精度。 Field值类型可以是float,integer, string, boolean。 --weather,location=us-midwest temperature=82 1465839830100400200这里的82会被看作是float类型 --weather,location=us-midwest temperature=82i 1465839830100400200这里的82会被看作是integer类型 --weather,location=us-midwest temperature="too warm" 1465839830100400200这里的too warm会被看作是string类型 --weather,location=us-midwest too_hot=true 1465839830100400200,这里的true就是boolean类型,表示true的可以是t,T, true, True, TRUE,表示false的可以是f,F, false, False, FALSE 在同一个分片shard中存储不同类型的field值会报错: --INSERT weather,location=us-midwest temperature=82 1465839830100400200 --INSERT weather,location=us-midwest temperature=82i 1465839830100400300 ERR:{"error":"field type conflict:input field\"temperature\" on measuremetn \"weather\" is type int64} 但是在不同的分片Shard中存储不同类型的field值不会报错: --INSERT weather,location=us-midwest temperature=82 1465839830100400200 --INSERT weather,location=us-midwest temperature=82i 1465839830100400300 ``` **引号** ``` 不要在时间戳上加双引号: --INSERT weather,location=us-midwest temperature=82 "1465839830100400200" ERR: {"error":"unable to parse 'weather,location=us-midwest temperature=82 \"1465839830100400200\"': bad timestamp"} 不要在字段field值上加单引号: --INSERT weather,location=us-midwest temperature='too warm' ERR: {"error":"unable to parse 'weather,location=us-midwest temperature='too warm'': invalid boolean"} 不要在tag的key,value,field的key上加单引号或双引号,这样虽然不会报错,但InfluxDB会把引号看作是measruements的一部分: --INSERT "weather",location=us-midwest temperature=87 1465839830100400200 --SHOW MEASURMENTS --会列出"weather" --这样查询起来会麻烦:SELECT * FROM "\"weather\"" 不要在filed值上加双引号,InfluxDB会看作是字符串类型: --INSERT weather,location=us-midwest temperatrue="82" ``` **特殊字符Special Characters** ``` ,通过\转义: weather,location=us\,midwest temperature=82 1465839830100400200 =通过\转义: weather,location=us-midwest temp\=rature=82 1465839830100400200 空格通过\转义: weather,location\ place=us-midwest temperature=82 1465839830100400200 measurement中的,通过\转义: wea\,ther,lication=us-midwest temperature=82 1465839830100400200 measurement中的空格通过\转义: wea\ ther,location=us-midwest temperature=82 1465839830100400200 字段filed值中的双引号用\转义: weather,location=us-midwest temperature="too\"hot\"" 1465839830100400200 /或\的表现: --weather,location=us-midwest temperature_str="too hot/cold" 1465839830100400201 --weather,location=us-midwest temperature_str="too hot\cold" 1465839830100400202 --weather,location=us-midwest temperature_str="too hot\\cold" 1465839830100400203 --weather,location=us-midwest temperature_str="too hot\\\cold" 1465839830100400204 --weather,location=us-midwest temperature_str="too hot\\\\\cold" 1465839830100400205 --weather,location=us-midwest temperature_str="too hot\\\\\cold" 1465839830100400206 > SELECT * FROM "wather" name:weather time location temperature_str 1465839830100400201 us-midwest too hot/cold 1465839830100400202 us-midwest too hot\cold 1465839830100400203 us-midwest too hot\cold 两个会去掉一个 1465839830100400204 us-midwest too hot\\cold 三个去掉一个 1465839830100400205 us-midwest too hot\\cold 四个去掉两个 1465839830100400206 us-midwest too hot\\\cold 5个去掉两个 ``` **关键字Keywords** ``` time可以是database, measurement, retension plocy, subscription, user的名称,time不能作为tag或field的key ``` **聚合aggregation** InfluxQL函数,对一组数据进行计算。 ``` ==COUNT() > SELECT COUNT("water_level") FROM "h2o_feet" 返回h2o_feet"这个measurement中water_level这个字段field值不为空的数量 > SELECT COUNT(*) FROM "h2o_feet" 返回h2o_feet"这个measurement中所有字段字段field值不为空的数量 > SELECT COUNT(/water/) FROM "h2o_feet" 返回h2o_feet"这个measurement中字段包含water并且值不为空的数量 > SELECT COUNT("water_level") FROM "h2o_feet" WHERE time >= '2015-08-17T23:48:00Z' AND time <= '2015-08-18T00:54:00Z' GROUP BY time(12m),* fill(200) LIMIT 7 SLIMIT 1 时间范围,12分钟的时间间隔进行分组,没有值的用200填充,数据点个数最多为7,序列个数最多为1 ``` # InfluxQL-基本 **连接和退出数据库** ``` $ .\influx -precision rfc3339 Connected to http://localhost:8086 version1.7.7 InfluxDB shell version:1.7.1 rfc3339的时间戳格式是:YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ $ exit ``` **创建数据库** - 运行`influxd.exe`文件 - 启动influx: `./influx -precision rfc3339` - 创建数据库 ``` $ CREATE DATABASE NOAA_water_database ``` **下载测试数据并写入本地数据库** ``` 下载数据: $ curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt -o NOAA_data.txt 这样在目录中多了一个NOAA_data.txt文件 导入本地数据库: $ ./influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database 这时会报错:unknown arguments: .txt -precision=s 在`influx.exe`文件所在目录,把`NOAA_data.txt`改成`NOAA_data` $ ./influx -import -path=NOAA_data -precision=s -database=NOAA_water_database 连接数据库: $ ./influx -precsion rfc3339 -database NOAA_water_database 查询所有的表,即measument: $ SHOW measurements ``` # InfluxQL-Data exploration > 查询 **统计某个非空值字段的数量** ``` SELECT COUNT("water_level") FROM h2o_feet ``` **选择前几个** ``` SELECT * FROM h2o_feet LIMIT 5 ``` **查询所有fields和tags** ``` SELECT * FROM "h2o_feet" ``` **选择特定的tag和field** ``` $ ./influx -precsion rfc3339 $ USE NOAA_water_database $ SELECT "level description","location","water_level" FROM "h2o_feet" ``` **选择tag和field,用类型区分** ``` SELECT "level description"::field,"location"::tag,"water_level"::field FROM "h2o_feet" ``` **选择所有的field** ``` SELECT *::field FROM "h2o_feet" ``` **field简单计算** ``` SELECT ("water_level" * 2) + 4 from "h2o_feet" ``` **从多个measurements中查询数据** ``` select * from "h2o_feet","h2o_PH" ``` **从多个measurements中查询数据,用上数据库名** ``` select * from "NOAA_water_database"."autogen"."h2o_feet" ``` **查询某个数据库中某个measuremnt的所有数据** ``` select * from "NOAA_water_database".."h2o_feet" ``` **查询与tag相关的数据必须至少带一个field** ``` select "water_level","location" from "h2o_feet" ``` > 过滤 **Where语句语法** ``` field支持的操作符: field_key ['string' | boolean | float | integer] = <> != > >= < <= tag支持的操作符: tag_key ['tag_value'] = <> != ``` **根据字段值筛选** ``` select * from "h2o_feet" where "water_level">8 ``` **根据某个字段的字符串值筛选** ``` select * from "h2o_feet" where "level description" = 'below 3 feet' ``` **根据某个计算筛选** ``` select * from "h2o_feet" where "water_level" + 2 > 11.9 ``` **根据某个tag值筛选** ``` select "water_level" from "h2o_feet" wehre "location" = 'santa_monica' ``` **根据tag和field筛选** ``` select "water_level" from "h2o_feet" where "location" <> 'santa_monica' adn (water_level < -0.59 OR water_level > 9.95) ``` **根据timestamp筛选** ``` select * from h2o_feet wehre time > now() -7d ``` > 分组 **根据tag分组** ``` select MEAN(water_level) from h2o_feet group by location 根据location分组后,取每个分组中water_level字段的平均值 ``` **根据多个tag分组** ``` select MEAN(index) from h2o_feet group by lcoation,randtag ``` **根据所有tag分组** ``` select MEAN(index) from h2o_feet group by * ``` **根据时间间隔分组** ``` SELECT COUNT("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m) ``` **根据时间间隔和tag分组** ``` SELECT COUNT("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:30:00Z' GROUP BY time(12m),"location" ``` **根据时间间隔分组并移前** ``` SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-08-18T00:06:00Z' AND time <= '2015-08-18T00:54:00Z' GROUP BY time(18m,6m) ``` **groupby和fill的结合** ``` > SELECT MAX("water_level") FROM "h2o_feet" WHERE "location"='coyote_creek' AND time >= '2015-09-18T16:00:00Z' AND time <= '2015-09-18T16:42:00Z' GROUP BY time(12m) fill(100) ``` > INTO **在原来数据库基础上复制出一个新的数据库** 重命名一个数据库是不可能的,只能在原来数据库基础上创建一个新的数据库,用INTO语法。 ``` SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM "NOAA_water_database"."autogen"./.*/ GROUP BY * :MEASUREMENT表示原先数据库的measuments都复制到新的数据库。 autogen是数据保留策略,原先数据库和新的数据库都必须有,否则INTO语句无法执行。 GROUP BY *很关键,意思是把NOAA_water_database数据库中所有measuments下的所有tag也复制到copy_NOAA_water_database数据库。如果不这样写,原先数据库中measuments下的tag会变成copy_NOAA_water_database下的字段。 具体步骤: --创建新的数据库:create database copy_NOAA_water_database --进入源数据库:use NOAA_water_database --使用INTO语句复制数据: SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM "NOAA_water_database"."autogen"./.*/ GROUP BY * --进入新数据库:use copy_NOAA_water_database --查询新数据库的所有measurments:show measurements --查询新数据库是否有数据:select * from h2o_feet LIMIT 5 ``` **如果数据量很大,建议按measuement和时间范围,循序渐进地复制** ``` SELECT * INTO .. FROM .. WHERE time > now() - 100w and time < now() - 90w GROUP BY * SELECT * INTO .. FROM ..} WHERE time > now() - 90w and time < now() - 80w GROUP BY * SELECT * INTO .. FROM .. WHERE time > now() - 80w and time < now() - 70w GROUP BY * ``` **把查询结果复制到measument中去** ``` SELECT "water_level" INTO "h2o_feet_copy_1" FROM "h2o_feet" WHERE "location" = 'coyote_creek' ``` > 排序 ``` 根据时间降序: SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' ORDER BY time DESC 分组排序: SELECT MEAN("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:42:00Z' GROUP BY time(12m) ORDER BY time DESC ``` >LIMIT和SLIMIT ``` 限制point返回数量: SELECT "water_level","location" FROM "h2o_feet" LIMIT 3 限制series返回数量: SELECT "water_level" FROM "h2o_feet" GROUP BY * SLIMIT 1 ``` > OFFSET SOFFSET ``` 显示point的第4,5,6条数据 SELECT "water_level","location" FROM "h2o_feet" LIMIT 3 OFFSET 3 显示point的第1,2,3条数据 SELECT "water_level","location" FROM "h2o_feet" LIMIT 3 SELECT MEAN("water_level") FROM "h2o_feet" WHERE time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:42:00Z' GROUP BY *,time(12m) ORDER BY time DESC LIMIT 2 OFFSET 2 SLIMIT 1 显示serie的第2条数据 SELECT "water_level" FROM "h2o_feet" GROUP BY * SLIMIT 1 SOFFSET 1 ``` > Time Zone ``` 选择时区基准 SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00Z' AND time <= '2015-08-18T00:18:00Z' tz('America/Chicago') SELECT语句即使没有选择时间范围,也有默认时间范围: 1677-09-21 00:12:43.145224194 and 2262-04-11T23:47:16.854775806Z GROUP BY time()的时间范围是从过去到现在: 1677-09-21 00:12:43.145224194到现在 使用RFC3339的时间类型字符串: SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18T00:00:00.000000000Z' AND time <= '2015-08-18T00:12:00Z' 使用RFC3339-like的时间类型字符串: SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= '2015-08-18' AND time <= '2015-08-18 00:12:00' 使用epoch时间戳: SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= 1439856000000000000 AND time <= 1439856720000000000 使用second-precision epoch时间戳: SELECT "water_level" FROM "h2o_feet" WHERE "location" = 'santa_monica' AND time >= 1439856000s AND time <= 1439856720s 在RFC3339-like的时间类型字符串上运行计算: SELECT "water_level" FROM "h2o_feet" WHERE time > '2015-09-18T21:24:00Z' + 6m 在epoch时间戳上运行计算: SELECT "water_level" FROM "h2o_feet" WHERE time > 24043524m - 6m ``` > 相对时间 ``` 仅仅相对时间: SELECT "water_level" FROM "h2o_feet" WHERE time > now() - 1h 相对时间和绝对时间结合: SELECT "level description" FROM "h2o_feet" WHERE time > '2015-09-18T21:18:00Z' AND time < now() + 1000d ``` > 正则表达式 ``` 选择tag或field中包含1: SELECT /l/ FROM "h2o_feet" LIMIT 1 选择所有包含temperature的measurment中的degrees的平均值 SELECT MEAN("degrees") FROM /temperature/ location这个tag包含m, water_level这个field大于3: SELECT MEAN(water_level) FROM "h2o_feet" WHERE "location" =~ /[m]/ AND "water_level" > 3 location这个tag没有值: SELECT * FROM "h2o_feet" WHERE "location" !~ /./ location这个tag有值: SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" =~ /./ level description这个字段的值包含between SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'santa_monica' AND "level description" =~ /between/ 分组时使用正则表达式: SELECT FIRST("index") FROM "h2o_quality" GROUP BY /l/ ``` > 数据类型 ``` 返回water_level这个字段的类型是float: SELECT "water_level"::float FROM "h2o_feet" LIMIT 4 ``` > 数据类型转换 ``` 把water_level的float类型的值转换成integer: SELECT "water_level"::integer FROM "h2o_feet" LIMIT 4 把water_level的float类型的值转换成string(不支持): SELECT "water_level"::string FROM "h2o_feet" LIMIT 4 ``` > 合并行为 ``` 默认把两个serie自动合并: SELECT MEAN("water_level") FROM "h2o_feet" 避免自动合并: SELECT MEAN("water_level") FROM "h2o_feet" WHERE "location" = 'coyote_creek' 分别得到两个serie的数据: SELECT MEAN("water_level") FROM "h2o_feet" GROUP BY "location" ``` > 多条语句 ``` SELECT MEAN("water_level") FROM "h2o_feet"; SELECT "water_level" FROM "h2o_feet" LIMIT 2 ``` > 子语句 ``` SELECT SUM("max") FROM (SELECT MAX("water_level") FROM "h2o_feet" GROUP BY "location") ``` # InfluxQL-Schema exploration ``` 展示所有数据库: SHOW DATABASES 展示数据库的数据保留策略: SHOW RETENTION POLICIES ON NOAA_water_database 展示某个数据库的所有时间序列: SHOW SERIES ON NOAA_water_database 展示某个数据库某个表符合条件的时间序列: SHOW SERIES ON NOAA_water_database FROM "h2o_quality" WHERE "location" = 'coyote_creek' LIMIT 2 展示某个数据库的所有表: SHOW MEASUREMENTS ON NOAA_water_database 展示某个数据库某个以h2o开头的表,randtag这个tag的值包含整型: SHOW MEASUREMENTS ON NOAA_water_database WITH MEASUREMENT =~ /h2o.*/ WHERE "randtag" =~ /\d/ 展示某个数据库的所有tag的key: SHOW TAG KEYS ON "NOAA_water_database" 展示TAG的值: SHOW TAG VALUES ON "NOAA_water_database" WITH KEY = "randtag" 展示数据库字段的key: SHOW FIELD KEYS ON "NOAA_water_database" ``` # InfluxQL-Data management ``` 创建数据库使用默认配置: CREATE DATABASE "NOAA_water_database" 创建数据库自定义配置: CREATE DATABASE "NOAA_water_database" WITH DURATION 3d REPLICATION 1 SHARD DURATION 1H NAME "liquid" 删除数据库: DROP DATABASE "NOAA_water_database" 删除表中的时间序列: DROP SERIES FROM "h2o_feet" 根据tag值删除时间序列: DROP SERIES FROM "h2o_feet" WHERE "location" = 'santa_monica' 删除所有表记录: DELETE FROM "h2o_feet" 带条件的删除: DELETE FROM "h2o_quality" WHERE "randtag" = '3' DELETE WHERE "h2o_quality" WHERE time < '2016-01-01' 删除表: DROP MEASUREMENT "h2o_feet" 删除shard: DROOP SHARD 1 数据保留策略:DURATION最小1个小时,最大INF表示无穷;REPLICATION,决定了每个point在集群中有几份,默认是3份,为了确保数据及时响应给请求,这里的值最好小于等于集群中的数据节点。在单结点实例中REPLICATION的设置无效;SHARD DURATION设置Shard Group的时间范围,这里的值没有无线INF一说。默认情况下SHARD DURATION的值受RETENTION POLICY影响。SHARD DURATION的默认值是1小时。 --CREATE RETENTION POLICY "one_day_only" ON "NOAA_water_database" DURATION 1d REPLICATION 1 --把新的策略设置成默认策略:CREATE RETENTION POLICY "one_day_only" ON "NOAA_water_database" DURATION 23h60m REPLICATION 1 DEFAULT 创建并修改策略: --创建策略:CREATE RETENTION PPLICY "what_is_time" ON "NOAA_water_database" DURATION 2d REPLICATION 1 --修改策略:ALTER RETENTION POLICY "what_is_time" ON "NOAA_water_database" DURAITON 3w SHARD DURATION 2H DEFAULT 删除策略: DROP RETENTION POLICY "what_is_time" ON "NOAA_water_database" ``` # InfluxQL-Continuous Queries 自动或间隔运行并且保存在measurement中。 **自动统计数据**: ``` CREATE CONTINUOUS QUERY "cq_basic" ON "transporation" BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) END cq_basic是自动运行的query的名称,每小时从bus_data这个measurment中统计出来的数据保存到trasporation数据库中的average_passengers这个measurement中。 select * from "average_passengers" ``` **自动统计数据,并保存到不同的RETENTION POLICY上**: ``` CREATE CONTINUOUS QUERY "cq_basic_rp" ON "transporation" BEGIN SELECT mean("passengers") INTO "transporation"."three_weeks"."average_passengers" FROM "bus_data" GROUP BY time(1h) SELECT * FROM "transporation"."three_weeks"."average_passengers" ``` **自动统计数据,保存到不同的数据库**: ``` CREATE CONTINUOUS QUERY "cq_basic_br" ON "transporation" BEGIN SELECT mean(*) INTO "downsampled_trasporation"."autogen".:MEASUREMENT FROM /.*/ GROUP BY time(30m) END ``` **自动统计数据,延迟保存到另外的表**: ``` CREATE CONTINUOUS QUERY "cq_basic_offset" ON "transporation" BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h,15m) ``` 自动统计数据,每隔1小时统计一次,然后每30分钟统计一次,即半点的时候统计一次,最终半点的数据会被下一个整点的数据替换掉。 ``` CREATE CONTINUOUS QUERY "cq_advanced_every" ON "transportation" RESAMPLE EVERY 30m BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) END ``` 自动统计数据,每30分钟统计一次数据,统计前1个小时的数据。 ``` CREATE CONTINUOUS QUERY "cq_advanced_for" ON "transportation" RESAMPLE FOR 1h BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m) END ``` 自动统计for和every结合起来: ``` CREATE CONTINUOUS QUERY "cq_advanced_every_for" ON "transportation" RESAMPLE EVERY 1h FOR 90m BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(30m) END ``` 自动统计,填上空值 ``` CREATE CONTINUOUS QUERY "cq_advanced_for_fill" ON "transportation" RESAMPLE FOR 2h BEGIN SELECT mean("passengers") INTO "average_passengers" FROM "bus_data" GROUP BY time(1h) fill(1000) END ``` 展示所有Continuous Query ``` SHOW CONTINUOUS QUERIES ``` 删除Continius Query ``` DROP CONTINOUS QUERY "idle_hands" ON "" ``` # InfluxQL-Functions - COUNT() - DISTNCT() - INTEGRAL() - MEAN() - MEDIAN()排好序的中位数 - MODE()字段值中出现频率最高的值 - SPREAD()字段值最大最小之差 - STDDEV()字段值标准差 - SUM() - BOTTOM() - FIRST() - LAST() - MAX() - MIN() - PERCENTILE()字段值某个百分位上的值 - SAMPLE()随机样本 - TOP() - ABS() - ACOS() - ASIN() - ATAN() - ATAN2() - CEL() - COS() - CUMULATIVE_SUM() - DERIVATIVE()变化率 - DIFFERENCE()差值 - ELAPSED()时间戳差值 - EXP()指数 - FLOOR() - LN()自然对数 - LOG() - LOG2() - LOG10() - MOVING_AVERAGE()滚动窗口的平均值 - NON_NEGATIVE_DERIVATIVE()非负变换率 - NON_NEGATIVE_DIFFERENCE()非负差值 - POW() - ROUND() - SIN() - SQRT() - TAN() # InfluxQL-Mathematical operations ``` 加法: SELECT "A" + 5 FROM "add" 减法: SELECT "A" - "B" from "" 乘法: SELECT "A" * "B" * "C" from "" 除法: SELECT 10 / "A" FROM "" 取余: SELECT "B" % 2 FROM "" ```