若泽大数据 www.ruozedata.com

ruozedata


  • 主页

  • 归档

  • 分类

  • 标签

  • 发展历史

  • Suche

大数据之实时数据源同步中间件--生产上Canal与Maxwell颠峰对决

Veröffentlicht am 2018-05-14 | Bearbeitet am 2019-07-18 | in 实时同步中间件 | Aufrufe:
一.数据源同步中间件:

Canal
https://github.com/alibaba/canal
https://github.com/Hackeruncle/syncClient

Maxwell
https://github.com/zendesk/maxwell
maxwell

二.架构使用

MySQL —- 中间件 mcp —>KAFKA—>?—>存储HBASE/KUDU/Cassandra 增量的
a.全量 bootstrap
b.增量

1.对比
Canal(服务端)Maxwell(服务端+客户端)
语言JavaJava
活跃度活跃活跃
HA支持定制 但是支持断点还原功能
数据落地定制落地到kafka
分区支持支持
bootstrap(引导)不支持支持
数据格式格式自由json(格式固定) spark json–>DF
文档较详细较详细
随机读支持支持

个人选择Maxwell

a.服务端+客户端一体,轻量级的
b.支持断点还原功能+bootstrap+json
Can do SELECT * from table (bootstrapping) initial loads of a table.
supports automatic position recover on master promotion
flexible partitioning schemes for Kakfa - by database, table, primary key, or column
Maxwell pulls all this off by acting as a full mysql replica, including a SQL parser for create/alter/drop statements (nope, there was no other way).

2.官网解读

B站视频

3.部署

3.1 MySQL Install
https://github.com/Hackeruncle/MySQL/blob/master/MySQL%205.6.23%20Install.txt
https://ke.qq.com/course/262452?tuin=11cffd50

3.2 修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ vi /etc/my.cnf

[mysqld]

binlog_format=row

$ service mysql start

3.3 创建Maxwell的db和用户
mysql> create database maxwell;
Query OK, 1 row affected (0.03 sec)

mysql> GRANT ALL on maxwell.* to 'maxwell'@'%' identified by 'ruozedata';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql>

3.4解压

1
[root@hadoop000 software]# tar -xzvf maxwell-1.14.4.tar.gz

3.5测试STDOUT:

1
2
3
bin/maxwell --user='maxwell' \
--password='ruozedata' --host='127.0.0.1' \
--producer=stdout

测试1:insert sql:

1
2
mysql> insert into ruozedata(id,name,age,address) values(999,'jepson',18,'www.ruozedata.com');
Query OK, 1 row affected (0.03 sec)

maxwell输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"database": "ruozedb",
"table": "ruozedata",
"type": "insert",
"ts": 1525959044,
"xid": 201,
"commit": true,
"data": {
"id": 999,
"name": "jepson",
"age": 18,
"address": "www.ruozedata.com",
"createtime": "2018-05-10 13:30:44",
"creuser": null,
"updatetime": "2018-05-10 13:30:44",
"updateuser": null
}
}

测试1:update sql:

1
mysql> update ruozedata set age=29 where id=999;

问题: ROW,你觉得binlog更新几个字段?

maxwell输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"database": "ruozedb",
"table": "ruozedata",
"type": "update",
"ts": 1525959208,
"xid": 255,
"commit": true,
"data": {
"id": 999,
"name": "jepson",
"age": 29,
"address": "www.ruozedata.com",
"createtime": "2018-05-10 13:30:44",
"creuser": null,
"updatetime": "2018-05-10 13:33:28",
"updateuser": null
},
"old": {
"age": 18,
"updatetime": "2018-05-10 13:30:44"
}
}

4.其他注意点和新特性

4.1 kafka_version 版本
Using kafka version: 0.11.0.1 0.10
jar:

1
2
3
4
5
6
7
8
[root@hadoop000 kafka-clients]# ll
total 4000
-rw-r--r--. 1 ruoze games 746207 May 8 06:34 kafka-clients-0.10.0.1.jar
-rw-r--r--. 1 ruoze games 951041 May 8 06:35 kafka-clients-0.10.2.1.jar
-rw-r--r--. 1 ruoze games 1419544 May 8 06:35 kafka-clients-0.11.0.1.jar
-rw-r--r--. 1 ruoze games 324016 May 8 06:34 kafka-clients-0.8.2.2.jar
-rw-r--r--. 1 ruoze games 641408 May 8 06:34 kafka-clients-0.9.0.1.jar
[root@hadoop000 kafka-clients]#

ruozedata WeChat Bezahlung
# 高级 # maxwell
Spark on YARN-Cluster和YARN-Client的区别
Spark RDD、DataFrame和DataSet的区别
  • Inhaltsverzeichnis
  • Übersicht

ruozedata

若泽数据优秀博客汇总
155 Artikel
31 Kategorien
74 schlagwörter
RSS
GitHub B站学习视频 腾讯课堂学习视频 官网
  1. 1. 一.数据源同步中间件:
  2. 2. 二.架构使用
    1. 2.1. 1.对比
    2. 2.2. 2.官网解读
    3. 2.3. 3.部署
    4. 2.4. 4.其他注意点和新特性
|
若泽数据
|