Ansible Playbook搭建zookeeper3.5集群

2025-05-19 06:01

环境说明:


系统centos8

zookeeper:192.168.1.61,192.168.1.62,192.168.1.63

步骤说明

环境准备:确保 Ansible 已经安装,并且可以通过 SSH 免密登录到所有目标主机。

创建 Ansible Inventory 文件:创建一个 hosts 文件,包含所有目标主机的信息。

编写 Ansible Playbook:编写一个 YAML 文件来完成整个集群的搭建和配置。

运行 Ansible Playbook:使用 ansible-playbook 命令运行 YAML 文件。


详细配置步骤

  1. 创建 Ansible Inventory 文件 hosts

    [hadoop-master]

    hadoop-master-1 ansible_host=192.168.1.61

    hadoop-master-2 ansible_host=192.168.1.62

    [hadoop-slave]

    hadoop-slave-1 ansible_host=192.168.1.61

    hadoop-slave-2 ansible_host=192.168.1.62

    hadoop-slave-3 ansible_host=192.168.1.63

    [zookeeper]

    zookeeper-1 ansible_host=192.168.1.61

    zookeeper-2 ansible_host=192.168.1.62

    zookeeper-3 ansible_host=192.168.1.63

    [hadoop-all:children]

    hadoop-master

    hadoop-slave

    [hadoop-all:vars]

    ansible_user=root

    配属zookeeper集群

    脚本如下:

    zookeeper_cluster_setup.yml


  • name: Setup Hadoop 3.x Cluster with Zookeeper

    hosts: hadoop-all

    become: true

    gather_facts: true

    vars:

      zookeeper_data_dir: /var/lib/zookeeper

      zookeeper_log_dir: /var/log/zookeeper

      zookeeper_version: 3.5.9

      zookeeper_home: /opt/apache-zookeeper-{{ zookeeper_version }}-bin

      zookeeper_user: hadoop

      zookeeper_group: hadoop

      zookeeper_client_port: 2181

      zookeeper_leader_port: 2888

      zookeeper_election_port: 3888

       

    tasks:

      - name: Update system packages

        dnf:

          name: '*'

          state: latest

        when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"

      - name: Disable SELinux

        selinux:

          state: disabled

      - name: Stop and disable firewalld

        systemd:

          name: firewalld

          state: stopped

          enabled: false

       

       #创建用户组

      - name: Create hadoop group if it doesn't exist

        group:

          name: "{{ zookeeper_group }}"

          state: present

      - name: Create hadoop user if it doesn't exist

        user:

          name: "{{ zookeeper_user }}"

          group: "{{ zookeeper_group }}"

          shell: /bin/bash

          create_home: true

          system: false

      - name: Create zookeeper_group

        group:

          name: "{{ zookeeper_group }}"

          state: present

      - name: Check if .ssh directory exists

        stat:

          path: /home/{{ zookeeper_user }}/.ssh

        register: ssh_dir

      - name: Generate SSH key if .ssh directory does not exist

        shell: |

          sudo -u {{ zookeeper_user }} mkdir -p /home/{{ zookeeper_user }}/.ssh

          sudo -u {{ zookeeper_user }} ssh-keygen -t rsa -b 2048 -f /home/{{ zookeeper_user }}/.ssh/id_rsa -N ''

        args:

          creates: /home/{{ zookeeper_user }}/.ssh/id_rsa

        when: not ssh_dir.stat.exists

      - name: Collect public keys from all hosts

        shell: "cat /home/{{ zookeeper_user }}/.ssh/id_rsa.pub"

        register: public_keys

        become_user: "{{ zookeeper_user }}"

      - name: Create temporary directory for authorized_keys parts

        file:

          path: /home/{{ zookeeper_user }}/.ssh/keys/

          state: directory

          mode: '0700'

          owner: "{{ zookeeper_user }}"

          group: "{{ zookeeper_group }}"

        delegate_to: "{{ item }}"

        loop: "{{ groups['hadoop-all'] }}"

      - name: Distribute public keys to all hosts

        copy:

          content: "{{ hostvars[item].public_keys.stdout }}"

          dest: "/home/{{ zookeeper_user }}/.ssh/keys/{{ item }}.pub"

          mode: '0600'

          owner: "{{ zookeeper_user }}"

          group: "{{ zookeeper_group }}"

        loop: "{{ groups['hadoop-all'] }}"

        delegate_to: "{{ inventory_hostname }}"

      - name: Assemble authorized_keys file

        assemble:

          src: /home/{{ zookeeper_user }}/.ssh/keys/

          dest: /home/{{ zookeeper_user }}/.ssh/authorized_keys

          mode: '0600'

          owner: "{{ zookeeper_user }}"

          group: "{{ zookeeper_group }}"

        delegate_to: "{{ item }}"

        loop: "{{ groups['hadoop-all'] }}"

      - name: Ensure SSH directory permissions

        file:

          path: /home/{{ zookeeper_user }}/.ssh

          state: directory

          mode: '0700'

          owner: "{{ zookeeper_user }}"

          group: "{{ zookeeper_group }}"

      - name: Clean up temporary directory

        file:

          path: /home/{{ zookeeper_user }}/.ssh/keys/

          state: absent

        delegate_to: "{{ item }}"

        loop: "{{ groups['hadoop-all'] }}"

- name: Install necessary packages

    dnf:

      name:

        - java-1.8.0-openjdk-devel

        - wget

        - tar

      state: present

    when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"

   #安装zookeeper

  - name: Download and extract Zookeeper

    unarchive:

      src: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz

      dest: /opt

      remote_src: true

      creates: “{{ zookeeper_home }}"

    ignore_errors: true

   #配置zookeeper

  - name: Create data and log directories

    file:

      path: "{{ item }}"

      state: directory

      owner: hadoop

      group: hadoop

      mode: '0755'

      recurse: true

    loop:

      - "{{ zookeeper_data_dir }}"

      - "{{ zookeeper_log_dir }}"

      - "/opt/apache-zookeeper-{{ zookeeper_version }}-bin"

   #配置zookeeper集群

  - name: Configure Zookeeper

    template:

      src: /opt/ansible/playbook/zookeeper/zoo.cfg.j2

      dest: /opt/apache-zookeeper-3.5.9-bin/conf/zoo.cfg

      owner: "{{ zookeeper_user }}"

      group: "{{ zookeeper_group }}"

      mode: '0644'

    notify:

      - Restart ZooKeeper if running

   #设置myid

  - name: Set Zookeeper myid

    copy:

      content: "{{ groups['hadoop-slave'].index(inventory_hostname) + 1 }}"

      dest: "{{ zookeeper_data_dir }}/myid"

      owner: "{{ zookeeper_user }}"

      group: "{{ zookeeper_group }}"

      mode: '0644'

   

   #启动zookeeper

  - name: Ensure ZooKeeper Service

    become_user: "{{ zookeeper_user }}"

    shell: |

      {{ zookeeper_home }}/bin/zkServer.sh {{ zookeeper_action | default('start') }}

    register: zk_operation

    changed_when: "'STARTED' in zk_operation.stdout or 'STOPPED' in zk_operation.stdout"

    failed_when: false

    when: zookeeper_state in ['started', 'stopped', 'restarted']

   #检测zookeeper状态

  - name: Check ZooKeeper status

    become_user: "{{ zookeeper_user }}"

    shell: "{{ zookeeper_home }}/bin/zkServer.sh status"

    register: zk_status

    changed_when: false

    failed_when: false

  - name: Show ZooKeeper status

    debug:

      var: zk_status.stdout_lines

 

 #检测是否是重启zookeeper

handlers:

  - name: Restart ZooKeeper if running

    become_user: "{{ zookeeper_user }}"

    shell: "{{ zookeeper_home }}/bin/zkServer.sh restart"

    when: zookeeper_state != 'stopped'

    listen: "Restart ZooKeeper if running"  

详解:

Playbook 以 Zookeeper 为名称,针对 hadoop-all 组中的所有主机执行。主要变量定义了 ZooKeeper 集群的安装路径、数据目录、用户信息和端口配置:

vars:

zookeeper_data_dir: /var/lib/zookeeper

zookeeper_log_dir: /var/log/zookeeper

zookeeper_version: 3.5.9

zookeeper_home: /opt/apache-zookeeper-{{ zookeeper_version }}-bin

zookeeper_user: hadoop

zookeeper_group: hadoop

zookeeper_client_port: 2181

zookeeper_leader_port: 2888

zookeeper_election_port: 3888

系统准备阶段

Playbook 首先执行系统初始化任务:

  • name: Update system packages

    dnf:

      name: '*'

      state: latest

    when: ansible_distribution == "CentOS" and ansible_distribution_major_version == "8"

  • name: Disable SELinux

    selinux:

      state: disabled

  • name: Stop and disable firewalld

    systemd:

      name: firewalld

      state: stopped

      enabled: false

    这些任务仅在 CentOS 8 系统上执行,包括更新系统包、禁用 SELinux 和防火墙,以确保后续安装顺利进行。

    用户与 SSH 配置

    Playbook 创建 hadoop 用户和组,并配置 SSH 免密登录:

  • name: Create hadoop group if it doesn't exist

    group:

      name: "{{ zookeeper_group }}"

      state: present

  • name: Create hadoop user if it doesn't exist

    user:

      name: "{{ zookeeper_user }}"

      group: "{{ zookeeper_group }}"

      shell: /bin/bash

      create_home: true

      system: false

    SSH 配置部分检测 .ssh 目录是否存在,不存在则生成 SSH 密钥对,并通过收集和分发公钥实现所有节点间的免密登录:

  • name: Check if .ssh directory exists

    stat:

      path: /home/{{ zookeeper_user }}/.ssh

    register: ssh_dir

  • name: Generate SSH key if .ssh directory does not exist

    shell: |

      sudo -u {{ zookeeper_user }} mkdir -p /home/{{ zookeeper_user }}/.ssh

      sudo -u {{ zookeeper_user }} ssh-keygen -t rsa -b 2048 -f /home/{{ zookeeper_user }}/.ssh/id_rsa -N ''

    args:

      creates: /home/{{ zookeeper_user }}/.ssh/id_rsa

    when: not ssh_dir.stat.exists

    ZooKeeper 安装与配置

    Playbook 从 Apache 下载并解压 ZooKeeper:

  • name: Download and extract Zookeeper

    unarchive:

      src: https://archive.apache.org/dist/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz

      dest: /opt

      remote_src: true

      creates: “{{ zookeeper_home }}"

    ignore_errors: true

    然后创建必要的目录并设置权限:

  • name: Create data and log directories

    file:

      path: "{{ item }}"

      state: directory

      owner: hadoop

      group: hadoop

      mode: '0755'

      recurse: true

    loop:

      - "{{ zookeeper_data_dir }}"

      - "{{ zookeeper_log_dir }}"

      - "/opt/apache-zookeeper-{{ zookeeper_version }}-bin"

    集群配置与启动

ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:


ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:


ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:


ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:


ZooKeeper 配置通过模板文件 zoo.cfg.j2 生成:

  • name: Configure Zookeeper

    template:

      src: /opt/ansible/playbook/zookeeper/zoo.cfg.j2

      dest: /opt/apache-zookeeper-3.5.9-bin/conf/zoo.cfg

      owner: "{{ zookeeper_user }}"

      group: "{{ zookeeper_group }}"

      mode: '0644'

    notify:

      - Restart ZooKeeper if running

    每个节点的唯一标识 myid 根据其在 hadoop-slave 组中的索引生成:

  • name: Set Zookeeper myid

    copy:

      content: "{{ groups['hadoop-slave'].index(inventory_hostname) + 1 }}"

      dest: "{{ zookeeper_data_dir }}/myid"

      owner: "{{ zookeeper_user }}"

      group: "{{ zookeeper_group }}"

      mode: '0644'

    服务管理

    Playbook 提供了启动、停止和重启 ZooKeeper 的功能:

  • name: Ensure ZooKeeper Service

    become_user: "{{ zookeeper_user }}"

    shell: |

      {{ zookeeper_home }}/bin/zkServer.sh {{ zookeeper_action | default('start') }}

    register: zk_operation

    changed_when: "'STARTED' in zk_operation.stdout or 'STOPPED' in zk_operation.stdout"

    failed_when: false

    when: zookeeper_state in ['started', 'stopped', 'restarted']

    通过执行 zkServer.sh status 检查服务状态并输出结果:

  • name: Check ZooKeeper status

    become_user: "{{ zookeeper_user }}"

    shell: "{{ zookeeper_home }}/bin/zkServer.sh status"

    register: zk_status

    changed_when: false

    failed_when: false

  • name: Show ZooKeeper status

    debug:

      var: zk_status.stdout_lines

    注意:

    变量使用不一致:

    部分任务使用 {{ zookeeper_user }} 和 {{ zookeeper_group }},而另一些直接使用 hadoop

    建议统一使用变量引用,提高可维护性

    模板路径问题:

    zoo.cfg.j2 模板路径指定为 /opt/ansible/playbook/zookeeper/,需要确保该路径存在

    集群节点识别:

    myid 生成使用 groups['hadoop-slave'],可能导致主节点无法正确识别

    建议使用 groups['hadoop-all'] 确保所有节点都被包含

    错误处理:

    部分任务使用 ignore_errors: true 或 failed_when: false,可能掩盖安装问题

    建议有针对性地处理错误,而非全局忽略

    版本控制:

    直接指定 ZooKeeper 版本为 3.5.9,建议使用变量引用

    下载 URL 硬编码,应与版本变量保持一致

    zoo.cfg.j2

    tickTime=2000

    initLimit=10

    syncLimit=5

    dataDir={{ zookeeper_data_dir }}

    clientPort={{ zookeeper_client_port }}

    dataLogDir={{ zookeeper_log_dir }}

    autopurge.snapRetainCount=3

    autopurge.purgeInterval=24

{% for host in groups['zookeeper'] %}


{% for host in groups['zookeeper'] %}


{% for host in groups['zookeeper'] %}


{% for host in groups['zookeeper'] %}

server.{{ loop.index }}={{ hostvars[host]['ansible_host'] }}:{{ zookeeper_leader_port }}:{{ zookeeper_election_port }}

{% endfor %}

ZooKeeper 配置文件(zoo.cfg)内容的详细解析,包含各配置项的作用、参数说明及模板语法解析:

一、基础配置项

tickTime=2000

作用:

定义 ZooKeeper 中最小时间单元(以毫秒为单位),用于控制心跳和超时时间。

其他时间配置(如 initLimit、syncLimit)通常是 tickTime 的倍数。

参数说明:

2000 表示 2 秒,是 ZooKeeper 集群的基础时间单位。

initLimit=10

作用:

主节点(Leader)与从节点(Follower)之间建立连接时,从节点完成初始化的最大超时时间。

实际超时时间为 initLimit * tickTime = 10 * 2000 = 20000 毫秒(20秒)。

适用场景:

集群首次启动或节点重启时,用于确保从节点有足够时间同步数据。

syncLimit=5

作用:

主节点与从节点之间进行数据同步的最大超时时间。

实际超时时间为 syncLimit * tickTime = 5 * 2000 = 10000 毫秒(10秒)。

适用场景:

正常运行时,确保从节点及时同步主节点的状态变更。

dataDir={{ zookeeper_data_dir }}

作用:

指定 ZooKeeper 存储数据快照(Snapshot)和事务日志(Transaction Log)的目录。

参数说明:

{{ zookeeper_data_dir }} 是 Ansible 变量,通常定义为 /var/lib/zookeeper。

注意:该目录需提前创建并赋予 hadoop 用户写入权限。

clientPort={{ zookeeper_client_port }}

作用:

指定 ZooKeeper 监听客户端连接的端口。

参数说明:

{{ zookeeper_client_port }} 是 Ansible 变量,默认值为 2181。

客户端(如 Hadoop)通过此端口与 ZooKeeper 通信。

dataLogDir={{ zookeeper_log_dir }}

作用:

单独指定 ZooKeeper 存储事务日志的目录(与数据快照目录分离)。

参数说明:

{{ zookeeper_log_dir }} 是 Ansible 变量,通常定义为 /var/log/zookeeper。

分离日志和数据目录可提升 I/O 性能。

autopurge.snapRetainCount=3

作用:

自动清理数据快照时,保留的最近快照数量。

参数说明:

超过 3 个的旧快照将被删除,避免磁盘占用过大。

autopurge.purgeInterval=24

作用:

自动清理快照和日志的时间间隔(单位:小时)。

参数说明:

24 表示每天执行一次自动清理。

需将该值设置为 1 或更大的整数,0 表示禁用自动清理。

二、集群配置项(Playbook 模板语法)

{% for host in groups['zookeeper'] %} 循环

作用:

动态生成 ZooKeeper 集群节点配置,适用于多节点集群。

模板语法解析:

groups['zookeeper']:引用 Ansible 清单中定义的 zookeeper 组,包含所有集群节点。

loop.index:循环索引,从 1 开始,用于生成节点 ID(server.id)。

server.{{ loop.index }}={{ hostvarshost }}:{{ zookeeper_leader_port }}:{{ zookeeper_election_port }}

作用:

定义集群中每个节点的地址和端口。

参数说明:

server.id:节点唯一标识(id 需与 dataDir/myid 文件中的值一致)。

{{ hostvars[host]['ansible_host'] }}:获取节点的实际 IP 地址(来自 Ansible 变量)。

{{ zookeeper_leader_port }}:主从节点间通信端口,默认值为 2888。

{{ zookeeper_election_port }}:节点选举端口,默认值为 3888。

示例输出(假设 3 节点集群):

server.1=192.168.1.61:2888:3888

server.2=192.168.1.62:2888:3888

server.3=192.168.1.63:2888:3888

三、配置文件生效条件

目录权限:

dataDir 和 dataLogDir 目录需由 hadoop 用户创建并拥有写入权限(通过 Ansible 的 file 模块设置)。

  • name: Create data and log directories

    file:

      path: "{{ item }}"

      state: directory

      owner: hadoop

      group: hadoop

      mode: '0755'

      recurse: true

    loop:

      - "{{ zookeeper_data_dir }}"

      - "{{ zookeeper_log_dir }}"

    myid 文件匹配:

    每个节点的 dataDir/myid 文件内容必须与 server.id 中的 id 一致(通过 Ansible 的 copy 模块生成)。

  • name: Set Zookeeper myid

    copy:

      content: "{{ groups['zookeeper'].index(inventory_hostname) + 1 }}"  # 索引从 0 开始,+1 后对应 server.1、server.2...

      dest: "{{ zookeeper_data_dir }}/myid"

    端口开放:

    需确保所有节点的 clientPort(2181)、leaderPort(2888)、electionPort(3888)未被防火墙拦截。

    注意

    集群节点无法通信

    可能原因:

    server.id 与 myid 不匹配。

    端口被防火墙阻止(如 firewalld 未关闭)。

    解决方案:

    检查 myid 文件内容与 zoo.cfg 中的 server.id 是否一致。

    关闭防火墙或开放对应端口。

    数据目录磁盘不足

    优化建议:

    将 dataDir 和 dataLogDir 挂载到独立磁盘,避免影响系统盘。

    调整 autopurge.snapRetainCount 和 autopurge.purgeInterval 以减少磁盘占用。

    集群性能瓶颈

    优化建议:

    增大 tickTime(如设置为 5000)以降低心跳频率,减少网络负载。

    增加 initLimit 和 syncLimit 的值,适应高延迟网络环境。

相关新闻
热点
投票
查看结果
Tags

站点地图 在线访客: 今日访问量: 昨日访问量: 总访问量:

© 2025 个人网站 版权所有

备案号:苏ICP备2024108837号

苏公网安备32011302322151号