<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>大数据环境搭建 on TrueSolのblog</title><link>https://blog.arcanelune.top/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA/</link><description>Recent content from TrueSolのblog</description><generator>Hugo</generator><language>zh-cn</language><managingEditor>3359429309@qq.com (TrueSol)</managingEditor><webMaster>3359429309@qq.com (TrueSol)</webMaster><copyright>本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.arcanelune.top/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA/index.xml" rel="self" type="application/rss+xml"/><item><title>大数据集群前置环境完整配置（CentOS 7.9 ）</title><link>https://blog.arcanelune.top/post/cluster-pre-environment/bigdata/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><author>3359429309@qq.com (TrueSol)</author><guid>https://blog.arcanelune.top/post/cluster-pre-environment/bigdata/</guid><description>
<![CDATA[<h1>大数据集群前置环境完整配置（CentOS 7.9 ）</h1><p>作者：TrueSol（3359429309@qq.com）</p>
        
          <blockquote>
<p>因为搭各种大数据环境快被折腾疯了，于是写了此集群前置教程记录下来</p>
</blockquote>
<blockquote>
<p>环境介绍：三台CentOS7.9虚拟机</p>
</blockquote>
<blockquote>
<p>主机名：hadoop100，hadoop101，hadoop102</p>
</blockquote>
<h2 id="一前置环境配置">
<a class="header-anchor" href="#%e4%b8%80%e5%89%8d%e7%bd%ae%e7%8e%af%e5%a2%83%e9%85%8d%e7%bd%ae"></a>
一、前置环境配置
</h2><h3 id="1最小化安装额外需要的配置">
<a class="header-anchor" href="#1%e6%9c%80%e5%b0%8f%e5%8c%96%e5%ae%89%e8%a3%85%e9%a2%9d%e5%a4%96%e9%9c%80%e8%a6%81%e7%9a%84%e9%85%8d%e7%bd%ae"></a>
（1）最小化安装额外需要的配置
</h3><h4 id="修改yum源">
<a class="header-anchor" href="#%e4%bf%ae%e6%94%b9yum%e6%ba%90"></a>
修改yum源
</h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">curl -o /etc/yum.repos.d/CentOS-Base.repo https://repo.huaweicloud.com/repository/conf/CentOS-7-anon.repo
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">curl -o /etc/yum.repos.d/epel-7.repo http://mirrors.aliyun.com/repo/epel-7.repo
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">yum clean all
</span></span><span class="line"><span class="cl">yum makecache
</span></span><span class="line"><span class="cl"><span class="c1"># 可选</span>
</span></span><span class="line"><span class="cl">yum update -y
</span></span></code></pre></div><blockquote>
<p>每台都要改</p>
</blockquote>
<h4 id="安装必备软件包">
<a class="header-anchor" href="#%e5%ae%89%e8%a3%85%e5%bf%85%e5%a4%87%e8%bd%af%e4%bb%b6%e5%8c%85"></a>
安装必备软件包
</h4><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">yum install -y epel-release
</span></span><span class="line"><span class="cl">yum install -y net-tools
</span></span><span class="line"><span class="cl">yum install -y vim
</span></span><span class="line"><span class="cl">yum install -y rsync
</span></span></code></pre></div><blockquote>
<p>每台都要安装</p>
</blockquote>
<h3 id="2设置ip与主机名">
<a class="header-anchor" href="#2%e8%ae%be%e7%bd%aeip%e4%b8%8e%e4%b8%bb%e6%9c%ba%e5%90%8d"></a>
（2）设置IP与主机名
</h3><blockquote>
<p>因为我的虚拟机是CentOS7.9版本，所以修改IP的文件跟别的系统可能不一样
如果系统和我不一样的话需要自己去找对应系统修改IP的方式</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="o">[</span>hadoop@hadoop100 ~<span class="o">]</span>$ sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33
</span></span><span class="line"><span class="cl"><span class="c1"># 将dhcp修改为static</span>
</span></span><span class="line"><span class="cl"><span class="nv">BOOTPROTO</span><span class="o">=</span><span class="s2">&#34;static&#34;</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 然后再末尾添加以下内容</span>
</span></span><span class="line"><span class="cl"><span class="c1"># IP</span>
</span></span><span class="line"><span class="cl"><span class="nv">IPADDR</span><span class="o">=</span>192.168.100.130
</span></span><span class="line"><span class="cl"><span class="c1"># 网关</span>
</span></span><span class="line"><span class="cl"><span class="nv">GATEWAY</span><span class="o">=</span>192.168.100.2
</span></span><span class="line"><span class="cl"><span class="c1"># DNS</span>
</span></span><span class="line"><span class="cl"><span class="nv">DNS1</span><span class="o">=</span>192.168.100.2
</span></span><span class="line"><span class="cl"><span class="nv">DNS2</span><span class="o">=</span>8.8.8.8
</span></span></code></pre></div><blockquote>
<p>注意：修改完保存，然后执行<code>systemctl restart network</code>重启网络服务</p>
</blockquote>
<blockquote>
<p>每台都要改静态IP，并且IP不可以相同！！！！！！！</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 修改主机名</span>
</span></span><span class="line"><span class="cl"><span class="o">[</span>hadoop@hadoop100 ~<span class="o">]</span>$ sudo hostnamectl set-hostname hadoop100
</span></span><span class="line"><span class="cl">修改为想要的主机名即可，这里修改为hadoop100
</span></span></code></pre></div><blockquote>
<p>每台都要改！！！
并且改完之后执行<code>bash</code>才会显示新的主机名</p>
</blockquote>
<h3 id="3配置hosts-映射">
<a class="header-anchor" href="#3%e9%85%8d%e7%bd%aehosts-%e6%98%a0%e5%b0%84"></a>
（3）配置hosts 映射
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo vim /etc/hosts
</span></span><span class="line"><span class="cl">将以下内容添加到末尾
</span></span><span class="line"><span class="cl">192.168.100.130 hadoop100
</span></span><span class="line"><span class="cl">192.168.100.131 hadoop101
</span></span><span class="line"><span class="cl">192.168.100.132 hadoop102
</span></span></code></pre></div><blockquote>
<p>每台都要改！！！</p>
</blockquote>
<h3 id="4关闭防火墙">
<a class="header-anchor" href="#4%e5%85%b3%e9%97%ad%e9%98%b2%e7%81%ab%e5%a2%99"></a>
（4）关闭防火墙
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">sudo systemctl stop firewalld
</span></span><span class="line"><span class="cl">sudo systemctl disable firewalld.service
</span></span><span class="line"><span class="cl">sudo setenforce 0
</span></span><span class="line"><span class="cl">sudo sed -i &#39;s/^SELINUX=.*/SELINUX=disabled/&#39; /etc/selinux/config
</span></span></code></pre></div><blockquote>
<p>每台都要改！！！</p>
</blockquote>
<h3 id="5配置免密登录">
<a class="header-anchor" href="#5%e9%85%8d%e7%bd%ae%e5%85%8d%e5%af%86%e7%99%bb%e5%bd%95"></a>
（5）配置免密登录
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ssh-keygen -t rsa
</span></span></code></pre></div><blockquote>
<p>每台机器都要执行，执行之后按三下回车</p>
</blockquote>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ssh-copy-id hadoop100
</span></span><span class="line"><span class="cl">ssh-copy-id hadoop101
</span></span><span class="line"><span class="cl">ssh-copy-id hadoop102
</span></span></code></pre></div><blockquote>
<p>每台机器都要执行这三行代码，然后根据提示输入密码</p>
</blockquote>
<h3 id="6给普通用户配置root权限">
<a class="header-anchor" href="#6%e7%bb%99%e6%99%ae%e9%80%9a%e7%94%a8%e6%88%b7%e9%85%8d%e7%bd%aeroot%e6%9d%83%e9%99%90"></a>
（6）给普通用户配置root权限
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">vim /etc/sudoers
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">以下内容要放到%wheel 这行下面
</span></span><span class="line"><span class="cl">hadoop  <span class="nv">ALL</span><span class="o">=(</span>ALL<span class="o">)</span>  NOPASSWD: ALL
</span></span></code></pre></div><blockquote>
<p>每台机器都要配置
我这里的普通用户是hadoop</p>
</blockquote>
<h3 id="7编写集群分发脚本">
<a class="header-anchor" href="#7%e7%bc%96%e5%86%99%e9%9b%86%e7%be%a4%e5%88%86%e5%8f%91%e8%84%9a%e6%9c%ac"></a>
（7）编写集群分发脚本
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir /home/hadoop/bin
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> /home/hadoop/bin
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">vim xsync
</span></span><span class="line"><span class="cl">输入以下内容：
</span></span><span class="line"><span class="cl"><span class="c1">#!/bin/bash</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 1. 定义集群主机列表（在这里统一修改）</span>
</span></span><span class="line"><span class="cl"><span class="nv">HOSTS</span><span class="o">=</span><span class="s2">&#34;hadoop100 hadoop101 hadoop102&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 判断参数个数</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="o">[</span> <span class="nv">$#</span> -lt <span class="m">1</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;Usage: </span><span class="nv">$0</span><span class="s2"> &lt;file1/dir1&gt; [file2/dir2] ...&#34;</span>
</span></span><span class="line"><span class="cl">  <span class="nb">exit</span> <span class="m">1</span>
</span></span><span class="line"><span class="cl"><span class="k">fi</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 遍历集群所有机器</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> host in <span class="nv">$HOSTS</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;==================== </span><span class="nv">$host</span><span class="s2"> ====================&#34;</span>
</span></span><span class="line"><span class="cl">  
</span></span><span class="line"><span class="cl">  <span class="c1"># 4. 遍历所有要发送的文件/目录</span>
</span></span><span class="line"><span class="cl">  <span class="k">for</span> file in <span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 5. 判断文件是否存在</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="o">[</span> -e <span class="s2">&#34;</span><span class="nv">$file</span><span class="s2">&#34;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl">      <span class="c1"># 6. 获取父目录（物理路径）和文件名</span>
</span></span><span class="line"><span class="cl">      <span class="nv">pdir</span><span class="o">=</span><span class="k">$(</span><span class="nb">cd</span> -P <span class="s2">&#34;</span><span class="k">$(</span>dirname <span class="s2">&#34;</span><span class="nv">$file</span><span class="s2">&#34;</span><span class="k">)</span><span class="s2">&#34;</span> <span class="o">&amp;&amp;</span> <span class="nb">pwd</span><span class="k">)</span>
</span></span><span class="line"><span class="cl">      <span class="nv">fname</span><span class="o">=</span><span class="k">$(</span>basename <span class="s2">&#34;</span><span class="nv">$file</span><span class="s2">&#34;</span><span class="k">)</span>
</span></span><span class="line"><span class="cl">      
</span></span><span class="line"><span class="cl">      <span class="c1"># 7. 远程创建目录并同步 (合并为一条 rsync 命令)</span>
</span></span><span class="line"><span class="cl">      <span class="c1"># 使用 ssh 先确保目录存在，然后 rsync 推送</span>
</span></span><span class="line"><span class="cl">      ssh <span class="nv">$host</span> <span class="s2">&#34;mkdir -p </span><span class="nv">$pdir</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">      <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> -eq <span class="m">0</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl">        rsync -av --rsync-path<span class="o">=</span><span class="s2">&#34;sudo rsync&#34;</span> <span class="s2">&#34;</span><span class="nv">$pdir</span><span class="s2">/</span><span class="nv">$fname</span><span class="s2">&#34;</span> <span class="s2">&#34;</span><span class="nv">$host</span><span class="s2">:</span><span class="nv">$pdir</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">      <span class="k">else</span>
</span></span><span class="line"><span class="cl">        <span class="nb">echo</span> <span class="s2">&#34;ERROR: Failed to create directory on </span><span class="nv">$host</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">      <span class="k">fi</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span>
</span></span><span class="line"><span class="cl">      <span class="nb">echo</span> <span class="s2">&#34;ERROR: </span><span class="nv">$file</span><span class="s2"> does not exist!&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">fi</span>
</span></span><span class="line"><span class="cl">  <span class="k">done</span>
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span></code></pre></div><p>保存后添加执行权限并分发到其余机器</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">chmod <span class="m">777</span> /home/hadoop/bin/xsync
</span></span><span class="line"><span class="cl">scp -r /home/hadoop/bin hadoop101:/home/hadoop/
</span></span><span class="line"><span class="cl">scp -r /home/hadoop/bin hadoop102:/home/hadoop/
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 以后集群分发就不用scp这么麻烦了</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 集群分发格式：xsync 完整路径</span>
</span></span></code></pre></div><h3 id="8配置集群时间同步">
<a class="header-anchor" href="#8%e9%85%8d%e7%bd%ae%e9%9b%86%e7%be%a4%e6%97%b6%e9%97%b4%e5%90%8c%e6%ad%a5"></a>
（8）配置集群时间同步
</h3><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 1. 安装ntpdate</span>
</span></span><span class="line"><span class="cl">sudo yum install -y ntpdate
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 同步阿里云时间服务器</span>
</span></span><span class="line"><span class="cl">sudo ntpdate ntp.aliyun.com
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 设置定时任务，每天凌晨2点自动同步时间</span>
</span></span><span class="line"><span class="cl">sudo crontab -e
</span></span><span class="line"><span class="cl"><span class="c1"># 第一行加入以下内容后保存退出</span>
</span></span><span class="line"><span class="cl"><span class="m">0</span> <span class="m">2</span> * * * /usr/sbin/ntpdate ntp.aliyun.com
</span></span></code></pre></div><blockquote>
<p>每台都要执行
并且这里只是简单的配置下集群时间同步
生产环境是另一种同步方式</p>
</blockquote>

        
        <hr><p>本文2026-05-12首发于<a href='https://blog.arcanelune.top/'>TrueSolのblog</a>，最后修改于2026-05-12</p><p>本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！</p>]]></description><category>大数据技术</category><category>大数据环境搭建</category></item></channel></rss>