什么情况下需要添加节点
作为一家乙方的oracle工程师,工作中会碰到很多客户rac添加节点的需求,常规的有以下几种情况:一个节点主机损坏、操作系统损坏、软件被误删除、新采购机器需要添加到集群等。在处理这些情况时,除了物理安装和基本配置外,还需要考虑一致性、网络配置、存储共享等多个方面。本次分享的案例主要是在一切前期准备主机、操作系统、内核参数、共享存储都完成,在执行addNode.sh时候容易碰到的问题。
$ ./addNode.sh "CLUSTER_NEW_NODES={rac2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={rac2-vip}"
情形一
出现如下报错:
PRCF-2022 : The file "/u01/11.2.0/grid/javavm/admin/classes.bin" became smaller while being transferred
PRCF-2022 : The file "/u01/11.2.0/grid/lib/libserver11.a" became smaller while being transferred
PRCF-2022 : The file "/u01/11.2.0/grid/install/usm/Novell/SLES11/x86_64/2.6.32.12-0.7/xen/bin/oracleacfs.ko" became smaller while being transferred
但此三个文件在节点2上又都可以找到,文件大小以及权限属性都和一节点一样,怀疑跟节点2有关(节点2本来是属于集群的,根据删除RAC节点删除节点2后,然后再添加的,可能跟这个有关)
解决方案:
在节点1的grid用户的$ORACLE_HOME/crs/install/install.excl文件中,添加如下:
javavm/admin/classes.bin
lib/libserver11.a
install/usm/Novell/SLES11/x86_64/2.6.32.12-0.7/xen/bin/oracleacfs.ko
不需要删除节点2上的grid目录,直接在节点1再跑一遍addnode.sh即可
情形二
出现报错:
PRKC-1025 : Failed to create a file under the filepath /opt because the filepath is not executable or writable
java.lang.OutOfMemoryError
解决办法:
Increase JRE_MEMORY_OPTIONS=" -mx1024m" or greater value in the oraparam.ini located in: $GRID_HOME/oui/
参考文档:Addnode.sh or Fresh Install Fails With PRKC-1025 and Java.Lang.OutOfMemoryError (文档 ID 1085893.1)
情形三
Performing tests to see whether nodes rac2,rac3 are available
.................SEVERE:OUI-35000: Fatal cluster error encountered (PRKN-1034 : Failed to retrieve IP address of host "rac2"). Correct the problem and try the operation again.
解决方案:
原因是由于inventory里面有rac的信息,根据实际根据更新inventory内容:
$ cd $ORACLE_HOME/oui/bin
$ ./runInstaller -updateNodeList ORACLE_HOME=<Oracle_home_location> "CLUSTER_NODES={remaining_node_list}" ---remaining_node_list表示所有其它节点
情形四
如果跑root.sh出现以下报错:
[root@hisrac01 grid]# /oracle/app/oraInventory/orainstRoot.sh
[064110]Start of resource "ora.cssd" failed
[064110]CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisrac01'
[064110]CRS-2672: Attempting to start 'ora.gipcd' on 'hisrac01'
[064110]CRS-2676: Start of 'ora.cssdmonitor' on 'hisrac01' succeeded
[064110]CRS-2676: Start of 'ora.gipcd' on 'hisrac01' succeeded
[064110]CRS-2672: Attempting to start 'ora.cssd' on 'hisrac01'
[064110]CRS-2672: Attempting to start 'ora.diskmon' on 'hisrac01'
[064110]CRS-2676: Start of 'ora.diskmon' on 'hisrac01' succeeded
[064110]CRS-2674: Start of 'ora.cssd' on 'hisrac01' failed
[064110]CRS-2679: Attempting to clean 'ora.cssd' on 'hisrac01'
[064110]CRS-2681: Clean of 'ora.cssd' on 'hisrac01' succeeded
[064110]CRS-2673: Attempting to stop 'ora.gipcd' on 'hisrac01'
[064110]CRS-2677: Stop of 'ora.gipcd' on 'hisrac01' succeeded
[064110]CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'hisrac01'
[064110]CRS-2677: Stop of 'ora.cssdmonitor' on 'hisrac01' succeeded
[064110]CRS-5804: Communication error with agent process
[064110]CRS-4000: Command Start failed, or completed with errors.
[064110]Failed to start Oracle Grid Infrastructure stack
[064111]Failed to start Cluster Synchorinisation Service in clustered mode at /oracle/11.2.0/grid/crs/install/crsconfig_lib.pm line 1278.
[064111]/oracle/11.2.0/grid/perl/bin/perl -I/oracle/11.2.0/grid/perl/lib -I/oracle/11.2.0/grid/crs/install /oracle/11.2.0/grid/crs/install/rootcrs.pl execution failed
解决方案:
检查心跳网卡两边配置是否一样,子网掩码需要一致
情形五
如果跑root.sh出现以下报错:
[root@hisrac01 /]# /oracle/11.2.0/grid/root.sh
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /oracle/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to upstart
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node hisrac02, number 2, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Start of resource "ora.asm" failed
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'hisrac01'
CRS-2676: Start of 'ora.drivers.acfs' on 'hisrac01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hisrac01'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-48189: OS command to create directory failed
Linux-x86_64 Error: 13: Permission denied
Additional information: 2
. For details refer to "(:CLSN00107:)" in "/oracle/11.2.0/grid/log/hisrac01/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'hisrac01' failed
CRS-2679: Attempting to clean 'ora.asm' on 'hisrac01'
CRS-2681: Clean of 'ora.asm' on 'hisrac01' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'hisrac01'
CRS-2677: Stop of 'ora.drivers.acfs' on 'hisrac01' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Grid Infrastructure stack
Failed to start ASM at /oracle/11.2.0/grid/crs/install/crsconfig_lib.pm line 1339.
/oracle/11.2.0/grid/perl/bin/perl -I/oracle/11.2.0/grid/perl/lib -I/oracle/11.2.0/grid/crs/install /oracle/11.2.0/grid/crs/install/rootcrs.pl execution failed
解决方案:
ORA-48141: error creating directory during ADR initialization [<oracle_base>/base/diag]
ORA-48189: OS command to create directory failed
创建$ORACLE_BASE/base/diag文件夹
情形六
报错如下:
PRCR-1079 : Failed to start resource ora.rac1.vip
CRS-2674: Start of 'ora.net1.network' on 'rac1' failed
CRS-2674: Start of 'ora.net1.network' on 'rac2' failed
CRS-2632: There are no more servers to try to place resource 'ora.rac1.vip' on that would satisfy its placement policy
start VIP on node:rac1 ... failed
Failed to perform new node configuration at /oracle/grid/crs_1/crs/install/crsconfig_lib.pm line 9209.
/oracle/grid/crs_1/perl/bin/perl -I/oracle/grid/crs_1/perl/lib -I/oracle/grid/crs_1/crs/install /oracle/grid/crs_1/crs/install/rootcrs.pl execution failed
解决方案:
这是因为我在后面修改了rac2的NETMASK,虽然service network restart了,但是没有在集群里做修改,所以报了这个错,可以用下面的方法解决:
./srvctl modify nodeapps -n rac2 -A 10.0.0.101/255.255.255.0/eth0 ###public ip
./srvctl modify nodeapps -n rac2 -A 10.0.0.103/255.255.255.0/eth0 ###vip
修改完后,重启集群。