环境说明:
系统centos8
zookeeper:192.168.1.61,192.168.1.62,192.168.1.63
步骤说明
环境准备:确保 Ansible 已经安装,并且可以通过 SSH 免密登录到所有目标主机。
创建 Ansible Inventory 文件:创建一个 hosts 文件,包含所有目标主机的信息。
编写 Ansible Playbook:编写一个 YAML 文件来完成整个集群的搭建和配置。
运行 Ansible Playbook:使用 ansible-playbook 命令运行 YAML 文件。
详细配置步骤
创建 Ansible Inventory 文件 hosts
[hadoop-master]
hadoop-master-1 ansible_host=192.168.1.61
hadoop-master-2 ansible_host=192.168.1.62
[hadoop-slave]
hadoop-slave-1 ansible_host=192.168.1.61
hadoop-slave-2 ansible_host=192.168.1.62
hadoop-slave-3 ansible_host=192.168.1.63
[zookeeper]
zookeeper-1 ansible_host=192.168.1.61
zookeeper-2 ansible_host=192.168.1.62
zookeeper-3 ansible_host=192.168.1.63
[hadoop-all:children]
hadoop-master
hadoop-slave
[hadoop-all:vars]
ansible_user=root
配属zookeeper集群
脚本如下:
zookeeper_cluster_setup.yml
name: Setup Hadoop 3.x Cluster with Zookeeper
hosts: hadoop-all
become: true
gather_facts: true
vars:
zookeeper_data_dir: /var/lib/zookeeper
zookeeper_log_dir: /var/log/zookeeper
zookeeper_version: 3.5.9
zookeeper_home: /opt/apache-zookeeper-{{ zookeeper_version }}-bin
zookeeper_user: hadoop
zookeeper_group: hadoop
zookeeper_client_port: 2181
zookeeper_leader_port: 2888
zookeeper_election_port: 3888
tasks:
- name: Update system packages
dnf:
name: '*'
state: latest
when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"
- name: Disable SELinux
selinux:
state: disabled
- name: Stop and disable firewalld
systemd:
name: firewalld
state: stopped
enabled: false
#创建用户组
- name: Create hadoop group if it doesn't exist
group:
name: "{{ zookeeper_group }}"
state: present
- name: Create hadoop user if it doesn't exist
user:
name: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
shell: /bin/bash
create_home: true
system: false
- name: Create zookeeper_group
group:
name: "{{ zookeeper_group }}"
state: present
- name: Check if .ssh directory exists
stat:
path: /home/{{ zookeeper_user }}/.ssh
register: ssh_dir
- name: Generate SSH key if .ssh directory does not exist
shell: |
sudo -u {{ zookeeper_user }} mkdir -p /home/{{ zookeeper_user }}/.ssh
sudo -u {{ zookeeper_user }} ssh-keygen -t rsa -b 2048 -f /home/{{ zookeeper_user }}/.ssh/id_rsa -N ''
args:
creates: /home/{{ zookeeper_user }}/.ssh/id_rsa
when: not ssh_dir.stat.exists
- name: Collect public keys from all hosts
shell: "cat /home/{{ zookeeper_user }}/.ssh/id_rsa.pub"
register: public_keys
become_user: "{{ zookeeper_user }}"
- name: Create temporary directory for authorized_keys parts
file:
path: /home/{{ zookeeper_user }}/.ssh/keys/
state: directory
mode: '0700'
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
delegate_to: "{{ item }}"
loop: "{{ groups['hadoop-all'] }}"
- name: Distribute public keys to all hosts
copy:
content: "{{ hostvars[item].public_keys.stdout }}"
dest: "/home/{{ zookeeper_user }}/.ssh/keys/{{ item }}.pub"
mode: '0600'
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
loop: "{{ groups['hadoop-all'] }}"
delegate_to: "{{ inventory_hostname }}"
- name: Assemble authorized_keys file
assemble:
src: /home/{{ zookeeper_user }}/.ssh/keys/
dest: /home/{{ zookeeper_user }}/.ssh/authorized_keys
mode: '0600'
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
delegate_to: "{{ item }}"
loop: "{{ groups['hadoop-all'] }}"
- name: Ensure SSH directory permissions
file:
path: /home/{{ zookeeper_user }}/.ssh
state: directory
mode: '0700'
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
- name: Clean up temporary directory
file:
path: /home/{{ zookeeper_user }}/.ssh/keys/
state: absent
delegate_to: "{{ item }}"
loop: "{{ groups['hadoop-all'] }}"
- name: Install necessary packages
dnf:
name:
- java-1.8.0-openjdk-devel
- wget
- tar
state: present
when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"
#安装zookeeper
- name: Download and extract Zookeeper
unarchive:
src: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz
dest: /opt
remote_src: true
creates: “{{ zookeeper_home }}"
ignore_errors: true
#配置zookeeper
- name: Create data and log directories
file:
path: "{{ item }}"
state: directory
owner: hadoop
group: hadoop
mode: '0755'
recurse: true
loop:
- "{{ zookeeper_data_dir }}"
- "{{ zookeeper_log_dir }}"
- "/opt/apache-zookeeper-{{ zookeeper_version }}-bin"
#配置zookeeper集群
- name: Configure Zookeeper
template:
src: /opt/ansible/playbook/zookeeper/zoo.cfg.j2
dest: /opt/apache-zookeeper-3.5.9-bin/conf/zoo.cfg
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
mode: '0644'
notify:
- Restart ZooKeeper if running
#设置myid
- name: Set Zookeeper myid
copy:
content: "{{ groups['hadoop-slave'].index(inventory_hostname) + 1 }}"
dest: "{{ zookeeper_data_dir }}/myid"
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
mode: '0644'
#启动zookeeper
- name: Ensure ZooKeeper Service
become_user: "{{ zookeeper_user }}"
shell: |
{{ zookeeper_home }}/bin/zkServer.sh {{ zookeeper_action | default('start') }}
register: zk_operation
changed_when: "'STARTED' in zk_operation.stdout or 'STOPPED' in zk_operation.stdout"
failed_when: false
when: zookeeper_state in ['started', 'stopped', 'restarted']
#检测zookeeper状态
- name: Check ZooKeeper status
become_user: "{{ zookeeper_user }}"
shell: "{{ zookeeper_home }}/bin/zkServer.sh status"
register: zk_status
changed_when: false
failed_when: false
- name: Show ZooKeeper status
debug:
var: zk_status.stdout_lines
#检测是否是重启zookeeper
handlers:
- name: Restart ZooKeeper if running
become_user: "{{ zookeeper_user }}"
shell: "{{ zookeeper_home }}/bin/zkServer.sh restart"
when: zookeeper_state != 'stopped'
listen: "Restart ZooKeeper if running"
详解:
Playbook 以 Zookeeper 为名称,针对 hadoop-all 组中的所有主机执行。主要变量定义了 ZooKeeper 集群的安装路径、数据目录、用户信息和端口配置:
vars:
zookeeper_data_dir: /var/lib/zookeeper
zookeeper_log_dir: /var/log/zookeeper
zookeeper_version: 3.5.9
zookeeper_home: /opt/apache-zookeeper-{{ zookeeper_version }}-bin
zookeeper_user: hadoop
zookeeper_group: hadoop
zookeeper_client_port: 2181
zookeeper_leader_port: 2888
zookeeper_election_port: 3888
系统准备阶段
Playbook 首先执行系统初始化任务:
name: Update system packages
dnf:
name: '*'
state: latest
when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"
name: Disable SELinux
selinux:
state: disabled
name: Stop and disable firewalld
systemd:
name: firewalld
state: stopped
enabled: false
这些任务仅在 CentOS 8 系统上执行,包括更新系统包、禁用 SELinux 和防火墙,以确保后续安装顺利进行。
用户与 SSH 配置
Playbook 创建 hadoop 用户和组,并配置 SSH 免密登录:
name: Create hadoop group if it doesn't exist
group:
name: "{{ zookeeper_group }}"
state: present
name: Create hadoop user if it doesn't exist
user:
name: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
shell: /bin/bash
create_home: true
system: false
SSH 配置部分检测 .ssh 目录是否存在,不存在则生成 SSH 密钥对,并通过收集和分发公钥实现所有节点间的免密登录:
name: Check if .ssh directory exists
stat:
path: /home/{{ zookeeper_user }}/.ssh
register: ssh_dir
name: Generate SSH key if .ssh directory does not exist
shell: |
sudo -u {{ zookeeper_user }} mkdir -p /home/{{ zookeeper_user }}/.ssh
sudo -u {{ zookeeper_user }} ssh-keygen -t rsa -b 2048 -f /home/{{ zookeeper_user }}/.ssh/id_rsa -N ''
args:
creates: /home/{{ zookeeper_user }}/.ssh/id_rsa
when: not ssh_dir.stat.exists
ZooKeeper 安装与配置
Playbook 从 Apache 下载并解压 ZooKeeper:
name: Download and extract Zookeeper
unarchive:
src: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz
dest: /opt
remote_src: true
creates: “{{ zookeeper_home }}"
ignore_errors: true
然后创建必要的目录并设置权限:
name: Create data and log directories
file:
path: "{{ item }}"
state: directory
owner: hadoop
group: hadoop
mode: '0755'
recurse: true
loop:
- "{{ zookeeper_data_dir }}"
- "{{ zookeeper_log_dir }}"
- "/opt/apache-zookeeper-{{ zookeeper_version }}-bin"
集群配置与启动
ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:
ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:
ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:
ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:
ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:
name: Configure Zookeeper
template:
src: /opt/ansible/playbook/zookeeper/zoo.cfg.j2
dest: /opt/apache-zookeeper-3.5.9-bin/conf/zoo.cfg
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
mode: '0644'
notify:
- Restart ZooKeeper if running
每个节点的唯一标识 myid 根据其在 hadoop-slave 组中的索引生成:
name: Set Zookeeper myid
copy:
content: "{{ groups['hadoop-slave'].index(inventory_hostname) + 1 }}"
dest: "{{ zookeeper_data_dir }}/myid"
owner: "{{ zookeeper_user }}"
group: "{{ zookeeper_group }}"
mode: '0644'
服务管理
Playbook 提供了启动、停止和重启 ZooKeeper 的功能:
name: Ensure ZooKeeper Service
become_user: "{{ zookeeper_user }}"
shell: |
{{ zookeeper_home }}/bin/zkServer.sh {{ zookeeper_action | default('start') }}
register: zk_operation
changed_when: "'STARTED' in zk_operation.stdout or 'STOPPED' in zk_operation.stdout"
failed_when: false
when: zookeeper_state in ['started', 'stopped', 'restarted']
通过执行 zkServer.sh status 检查服务状态并输出结果:
name: Check ZooKeeper status
become_user: "{{ zookeeper_user }}"
shell: "{{ zookeeper_home }}/bin/zkServer.sh status"
register: zk_status
changed_when: false
failed_when: false
name: Show ZooKeeper status
debug:
var: zk_status.stdout_lines
注意:
变量使用不一致:
部分任务使用 {{ zookeeper_user }} 和 {{ zookeeper_group }},而另一些直接使用 hadoop
建议统一使用变量引用,提高可维护性
模板路径问题:
zoo.cfg.j2 模板路径指定为 /opt/ansible/playbook/zookeeper/,需要确保该路径存在
集群节点识别:
myid 生成使用 groups['hadoop-slave'],可能导致主节点无法正确识别
建议使用 groups['hadoop-all'] 确保所有节点都被包含
错误处理:
部分任务使用 ignore_errors: true 或 failed_when: false,可能掩盖安装问题
建议有针对性地处理错误,而非全局忽略
版本控制:
直接指定 ZooKeeper 版本为 3.5.9,建议使用变量引用
下载 URL 硬编码,应与版本变量保持一致
zoo.cfg.j2
tickTime=2000
initLimit=10
syncLimit=5
dataDir={{ zookeeper_data_dir }}
clientPort={{ zookeeper_client_port }}
dataLogDir={{ zookeeper_log_dir }}
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
{% for host in groups['zookeeper'] %}
{% for host in groups['zookeeper'] %}
{% for host in groups['zookeeper'] %}
{% for host in groups['zookeeper'] %}
server.{{ loop.index }}={{ hostvars[host]['ansible_host'] }}:{{ zookeeper_leader_port }}:{{ zookeeper_election_port }}
{% endfor %}
ZooKeeper 配置文件(zoo.cfg)内容的详细解析,包含各配置项的作用、参数说明及模板语法解析:
一、基础配置项
tickTime=2000
作用:
定义 ZooKeeper 中最小时间单元(以毫秒为单位),用于控制心跳和超时时间。
其他时间配置(如 initLimit、syncLimit)通常是 tickTime 的倍数。
参数说明:
2000 表示 2 秒,是 ZooKeeper 集群的基础时间单位。
initLimit=10
作用:
主节点(Leader)与从节点(Follower)之间建立连接时,从节点完成初始化的最大超时时间。
实际超时时间为 initLimit * tickTime = 10 * 2000 = 20000 毫秒(20秒)。
适用场景:
集群首次启动或节点重启时,用于确保从节点有足够时间同步数据。
syncLimit=5
作用:
主节点与从节点之间进行数据同步的最大超时时间。
实际超时时间为 syncLimit * tickTime = 5 * 2000 = 10000 毫秒(10秒)。
适用场景:
正常运行时,确保从节点及时同步主节点的状态变更。
dataDir={{ zookeeper_data_dir }}
作用:
指定 ZooKeeper 存储数据快照(Snapshot)和事务日志(Transaction Log)的目录。
参数说明:
{{ zookeeper_data_dir }} 是 Ansible 变量,通常定义为 /var/lib/zookeeper。
注意:该目录需提前创建并赋予 hadoop 用户写入权限。
clientPort={{ zookeeper_client_port }}
作用:
指定 ZooKeeper 监听客户端连接的端口。
参数说明:
{{ zookeeper_client_port }} 是 Ansible 变量,默认值为 2181。
客户端(如 Hadoop)通过此端口与 ZooKeeper 通信。
dataLogDir={{ zookeeper_log_dir }}
作用:
单独指定 ZooKeeper 存储事务日志的目录(与数据快照目录分离)。
参数说明:
{{ zookeeper_log_dir }} 是 Ansible 变量,通常定义为 /var/log/zookeeper。
分离日志和数据目录可提升 I/O 性能。
autopurge.snapRetainCount=3
作用:
自动清理数据快照时,保留的最近快照数量。
参数说明:
超过 3 个的旧快照将被删除,避免磁盘占用过大。
autopurge.purgeInterval=24
作用:
自动清理快照和日志的时间间隔(单位:小时)。
参数说明:
24 表示每天执行一次自动清理。
需将该值设置为 1 或更大的整数,0 表示禁用自动清理。
二、集群配置项(Playbook 模板语法)
{% for host in groups['zookeeper'] %} 循环
作用:
动态生成 ZooKeeper 集群节点配置,适用于多节点集群。
模板语法解析:
groups['zookeeper']:引用 Ansible 清单中定义的 zookeeper 组,包含所有集群节点。
loop.index:循环索引,从 1 开始,用于生成节点 ID(server.id)。
server.{{ loop.index }}={{ hostvarshost }}:{{ zookeeper_leader_port }}:{{ zookeeper_election_port }}
作用:
定义集群中每个节点的地址和端口。
参数说明:
server.id:节点唯一标识(id 需与 dataDir/myid 文件中的值一致)。
{{ hostvars[host]['ansible_host'] }}:获取节点的实际 IP 地址(来自 Ansible 变量)。
{{ zookeeper_leader_port }}:主从节点间通信端口,默认值为 2888。
{{ zookeeper_election_port }}:节点选举端口,默认值为 3888。
示例输出(假设 3 节点集群):
server.1=192.168.1.61:2888:3888
server.2=192.168.1.62:2888:3888
server.3=192.168.1.63:2888:3888
三、配置文件生效条件
目录权限:
dataDir 和 dataLogDir 目录需由 hadoop 用户创建并拥有写入权限(通过 Ansible 的 file 模块设置)。
name: Create data and log directories
file:
path: "{{ item }}"
state: directory
owner: hadoop
group: hadoop
mode: '0755'
recurse: true
loop:
- "{{ zookeeper_data_dir }}"
- "{{ zookeeper_log_dir }}"
myid 文件匹配:
每个节点的 dataDir/myid 文件内容必须与 server.id 中的 id 一致(通过 Ansible 的 copy 模块生成)。
name: Set Zookeeper myid
copy:
content: "{{ groups['zookeeper'].index(inventory_hostname) + 1 }}" # 索引从 0 开始,+1 后对应 server.1、server.2...
dest: "{{ zookeeper_data_dir }}/myid"
端口开放:
需确保所有节点的 clientPort(2181)、leaderPort(2888)、electionPort(3888)未被防火墙拦截。
注意
集群节点无法通信
可能原因:
server.id 与 myid 不匹配。
端口被防火墙阻止(如 firewalld 未关闭)。
解决方案:
检查 myid 文件内容与 zoo.cfg 中的 server.id 是否一致。
关闭防火墙或开放对应端口。
数据目录磁盘不足
优化建议:
将 dataDir 和 dataLogDir 挂载到独立磁盘,避免影响系统盘。
调整 autopurge.snapRetainCount 和 autopurge.purgeInterval 以减少磁盘占用。
集群性能瓶颈
优化建议:
增大 tickTime(如设置为 5000)以降低心跳频率,减少网络负载。
增加 initLimit 和 syncLimit 的值,适应高延迟网络环境。