Bits & Bytes of Life

A blog of my thoughts and actions.

Install Pacemaker + pgPool-II 3.1

| Comments

How To: Install Pacemaker + PgPool-II 3.1 on CentOS 5.7

This is the second post on the clustering system I have been a part of designing and implementing. In this post I will be walking through how to build a highly available instance of pgPool-II 3.1. If you don’t know what pgPool-II is, it is a load balancing, connection pooling and database replication system for PostgreSQL. We use it for when Bad Things Happen and we don’t want customers to notice. Currently, this guide won’t walk you through installing pgPool-II, but it will walk you through making a current install clustered and highly available.

Refer back to my previous guide of How To Install CoroSync with Pacemaker and Apache 2 for the initial steps to create the cluster. Refer to Installing pgPool-II on CentOS for steps on how to install and configure pgPool-II.

Download and save this into /usr/lib/ocf/resource.d/heartbeat/pgpool2

Save this into /usr/lib/ocf/resource.d/heartbeat/ (pgpool2) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
#!/bin/sh
#
#       pgpool-II resource agent.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
#######################################################################
#
# This resource agent was written by Eliot Gable <egable at gmail.com>
#
#######################################################################

#######################################################################
# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs

#######################################################################

meta_data() {
        cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="pgpool2" version="0.1">
<version>1.0</version>

<longdesc lang="en">
This resource agent provides basic management of pgpool-II.
It starts and stops pgpool-II and monitors its status. It will
also monitor the status of each connection and can optionally
attempt to automatically reconnect detached nodes or can
mark the service as failed if there are any detached nodes.
</longdesc>
<shortdesc lang="en">Manages pgpool-II</shortdesc>

<parameters>

<parameter name="pcp_admin_username" required="1">
<longdesc lang="en">
Administrative username for pgpool-II control.
</longdesc>
<shortdesc lang="en">Administrative username.</shortdesc>
<content type="string" default="" />
</parameter>

<parameter name="pcp_admin_password" required="1">
<longdesc lang="en">
Administrative password for pgpool-II control.
</longdesc>
<shortdesc lang="en">Administrative password.</shortdesc>
<content type="string" default="" />
</parameter>

<parameter name="pcp_admin_port" required="1">
<longdesc lang="en">
Administrative port for pgpool-II control.
</longdesc>
<shortdesc lang="en">Administrative port for PCP commands.</shortdesc>
<content type="string" default="" />
</parameter>

<parameter name="pcp_admin_host">
<longdesc lang="en">
Administrative host for pgpool-II control.
</longdesc>
<shortdesc lang="en">Administrative host for PCP commands.</shortdesc>
<content type="string" default="localhost" />
</parameter>

<parameter name="pgpool_bin">
<longdesc lang="en">
Path to the pgpool-II binary.
</longdesc>
<shortdesc lang="en">Path to pgpool.</shortdesc>
<content type="string" default="/usr/bin/pgpool" />
</parameter>

<parameter name="pcp_attach_node_bin">
<longdesc lang="en">
Path to the pcp_attach_node binary.
</longdesc>
<shortdesc lang="en">Path to pcp_attach_node.</shortdesc>
<content type="string" default="/usr/bin/pcp_attach_node" />
</parameter>

<parameter name="pcp_detach_node_bin">
<longdesc lang="en">
Path to the pcp_detach_node binary.
</longdesc>
<shortdesc lang="en">Path to pcp_detach_node.</shortdesc>
<content type="string" default="/usr/bin/pcp_detach_node" />
</parameter>

<parameter name="pcp_node_count_bin">
<longdesc lang="en">
Path to the pcp_node_count binary.
</longdesc>
<shortdesc lang="en">Path to pcp_node_count.</shortdesc>
<content type="string" default="/usr/bin/pcp_node_count" />
</parameter>

<parameter name="pcp_node_info_bin">
<longdesc lang="en">
Path to the pcp_node_info binary.
</longdesc>
<shortdesc lang="en">Path to pcp_node_info.</shortdesc>
<content type="string" default="/usr/bin/pcp_node_info" />
</parameter>

<parameter name="stop_mode">
<longdesc lang="en">
Stop mode to use when stopping pgpool-II.
</longdesc>
<shortdesc lang="en">Stop mode for pgpool-II.</shortdesc>
<content type="string" default="f" />
</parameter>

<parameter name="auto_reconnect">
<longdesc lang="en">
If "true", then during monitoring actions,
the resource agent will attempt to re-attach any disconnected
nodes. No error will be reported if re-attachment fails.
</longdesc>
<shortdesc lang="en">Automatically reattach failed nodes</shortdesc>
<content type="boolean" default="" />
</parameter>

<parameter name="fail_on_detached">
<longdesc lang="en">
This instructs the resource agent to mark pgpool-II in a
failed state if one or more of the nodes is detached. The monitor
action will always mark pgpool-II in a failed state if all nodes are
detached, so this is only useful if you want to mark pgpool-II in a
failed state if at least one node is detached. The auto_reconnect
option will always try to reconnect detached nodes (if enabled)
before this fail_on_detached mechanism triggers.
</longdesc>
<shortdesc lang="en">Marks resource as failed if at least one node is detached.</shortdesc>
<content type="string" default="" />
</parameter>

<parameter name="fail_on_node_detached">
<longdesc lang="en">
This is similar to fail_on_detached, except you can
specify a comma-seperated list of node IDs. If specified, pgpool2
will only be marked as "failed" if one of the nodes in the list
is detached or if all nodes are detached.
</longdesc>
<shortdesc lang="en">Specify a list of nodes to monitor for failure.</shortdesc>
<content type="string" default="" />
</parameter>

</parameters>

<actions>
<action name="start"        timeout="20" />
<action name="stop"         timeout="40" />
<action name="monitor"     timeout="20" interval="5" depth="0"/>
<action name="reload"       timeout="20" />
<action name="meta-data"  timeout="5" />
<action name="validate-all"  timeout="20" />
</actions>
</resource-agent>
END
}

#######################################################################

pgpool2_usage() {
        cat <<END
usage: $0 {start|stop|status|monitor|validate-all|meta-data}

Expects to have a fully populated OCF RA-compliant environment set.
END
}

pgpool2_start() {
    if pgpool2_status; then
        ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : already started."
        return $OCF_SUCCESS
    fi
    if $PGPOOL; then
echo "Trying to start pgpool..." >> /tmp/pgpool
        sleep 2
        if pgpool2_status; then
                ocf_log info "${OCF_RESOURCE_INSTANCE} Successfully started pgpool-II"
                return $OCF_SUCCESS
        else
echo "pgpool couldn't be started, why?" /tmp/pgpool
                ocf_log error "${OCF_RESOURCE_INSTANCE} Failed to start pgpool-II"
                return $OCF_ERR_GENERIC
        fi
    else
        ocf_log error "${OCF_RESOURCE_INSTANCE} Failed to start pgpool-II"
    fi
    return $OCF_ERR_GENERIC
}

pgpool2_stop() {
    ocf_log info "Using $PGPOOL -m $STOP_MODE stop to stop pgpool-II"
    if $PGPOOL -m $STOP_MODE stop; then
        ocf_log info "${OCF_RESOURCE_INSTANCE} Successfully stopped pgpool-II"
        return $OCF_SUCCESS
    else
        ocf_log info "${OCF_RESOURCE_INSTANCE} Already stopped pgpool-II"
        return $OCF_SUCCESS
    fi
}

pgpool2_status() {
    if [ ! -r "/var/run/pgpool/pgpool.pid" ]; then
        return $OCF_NOT_RUNNING
    fi
    ps_info=$(ps ax | grep "pgpool" | grep $(cat /var/run/pgpool/pgpool.pid))
echo "ps_info: $ps_info" >> /tmp/pgpool
    if [ -z "$ps_info" ]; then
echo "Found pgpool is NOT running..." >> /tmp/pgpool
        return $OCF_NOT_RUNNING
    else
echo "Found pgpool is running..." >> /tmp/pgpool
        # Try to reconnect any detached nodes
        if is_ocf_true ${OCF_RESKEY_auto_reconnect} ; then
            NODE_COUNT=$($PCP_NODE_COUNT 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password)
            for node in `seq 0 $((NODE_COUNT-1))`; do
                NODE_INFO=$($PCP_NODE_INFO 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password $node | awk '{print $3}')
                if [ "$NODE_INFO" = "3" ]; then
                    ocf_log info "Node $node is currently detached. Attempting to reattach the node."
                    $PCP_ATTACH_NODE 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password $node
                    ATTACHED="1"
                fi
            done
            if [ -n "${ATTACHED}" ]; then
                sleep 1
            fi
        fi
        # Fail if configured to fail on one or more detached nodes and a node is still detached
        if is_ocf_true ${OCF_RESKEY_fail_on_detached} ; then
            NODE_COUNT=$($PCP_NODE_COUNT 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password)
            for node in `seq 0 $((NODE_COUNT-1))`; do
                NODE_INFO=$($PCP_NODE_INFO 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password $node | awk '{print $3}')
                if [ "$NODE_INFO" = "3" ]; then
                    ocf_log error "Node $node is detached. The pgpool-II service has failed."
                    return $OCF_ERR_GENERIC
                fi
            done
        fi
        # Fail if one of the specifically configured nodes is detached at this point
        if [ -n "${OCF_RESKEY_fail_on_node_detached}" ]; then
            NODE_COUNT=$($PCP_NODE_COUNT 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password)
            for node in `seq 0 $((NODE_COUNT-1))`; do
                NODE_INFO=$($PCP_NODE_INFO 1 $OCF_RESKEY_pcp_admin_host $OCF_RESKEY_pcp_admin_port $OCF_RESKEY_pcp_admin_username $OCF_RESKEY_pcp_admin_password $node | awk '{print $3}')
                if [ "$NODE_INFO" = "3" ]; then
                    TOKEN=${OCF_RESKEY_fail_on_node_detached%%,*}
                    TOKEN_STRING=${OCF_RESKEY_fail_on_node_detached#*,}
                    while [ -n "$TOKEN" ] && [ "${TOKEN}" != "${TOKEN_STRING}" ]; do
                        if [ "$TOKEN" -eq $node ]; then
                            ocf_log error "Node $node is detached. The pgpool-II service has failed."
                            return $OCF_ERR_GENERIC
                        fi
                        TOKEN=${TOKEN_STRING%%,*}
                        if [ "${TOKEN_STRING}" == "${TOKEN_STRING#*,}" ]; then
                            TOKEN_STRING=""
                        else
                            TOKEN_STRING=${TOKEN_STRING#*,}
                        fi
                    done
                fi
            done
        fi
    fi
    # Service is running and there is no reason to fail
echo "pgpool should be running..." >> /tmp/pgpool
    return $OCF_SUCCESS
}

pgpool2_validate() {
    # If we're running as a clone, are the clone meta attrs OK?
#    if [ "${OCF_RESKEY_CRM_meta_clone}" ]; then
#       if [ "${OCF_RESKEY_CRM_meta_clone_node_max}" != 1 ]; then
#           ocf_log error "Misconfigured clone parameters. Must set meta attribute \"clone_node_max\" to 1, got ${OCF_RESKEY_CRM_meta_clone_node_max}."
#           return $OCF_ERR_ARGS
#       fi
#    fi
    if [ -z "$OCF_RESKEY_pcp_admin_username" ]; then
        ocf_log error "Missing required parameter \"pcp_admin_username\"."
        return $OCF_ERR_ARGS
    fi
    if [ -z "$OCF_RESKEY_pcp_admin_password" ]; then
        ocf_log error "Missing required parameter \"pcp_admin_password\"."
        return $OCF_ERR_ARGS
    fi
    if [ -z "$OCF_RESKEY_pcp_admin_host" ]; then
        ocf_log error "Missing required parameter \"pcp_admin_host\"."
        return $OCF_ERR_ARGS
    fi
    if [ -z "$OCF_RESKEY_pcp_admin_port" ]; then
        ocf_log error "Missing required parameter \"pcp_admin_port\"."
        return $OCF_ERR_ARGS
    fi
    # Did we get a path for the pgpool binary?
    if [ -z "$OCF_RESKEY_pgpool_bin" ]; then
        ocf_log error "Missing required parameter \"pgpool_bin\"."
        return $OCF_ERR_ARGS
    else
        if [ -x "$PGPOOL" ]; then
            ocf_log error "The pgpool binary is not executable or is not installed."
            return $OCF_ERR_INSTALLED
        fi
    fi
    # Did we get a path for the pcp_attach_node binary?
    if [ -z "$OCF_RESKEY_pcp_attach_node_bin" ]; then
        ocf_log error "Missing required parameter \"pcp_attach_node_bin\"."
        return $OCF_ERR_ARGS
    else
        if [ -x "$PCP_ATTACH_NODE" ]; then
            ocf_log error "The pcp_attach_node binary is not executable or is not installed."
            return $OCF_ERR_INSTALLED
        fi
    fi
    # Did we get a path for the pcp_detach_node binary?
    if [ -z "$OCF_RESKEY_pcp_detach_node_bin" ]; then
        ocf_log error "Missing required parameter \"pcp_detach_node_bin\"."
        return $OCF_ERR_ARGS
    else
        if [ -x "$PCP_DETACH_NODE" ]; then
            ocf_log error "The pcp_detach_node binary is not executable or is not installed."
            return $OCF_ERR_INSTALLED
        fi
    fi
    # Did we get a path for the pcp_node_count binary?
    if [ -z "$OCF_RESKEY_pcp_node_count_bin" ]; then
        ocf_log error "Missing required parameter \"pcp_node_count_bin\"."
        return $OCF_ERR_ARGS
    else
        if [ -x "$PCP_NODE_COUNT" ]; then
            ocf_log error "The pcp_node_count binary is not executable or is not installed."
            return $OCF_ERR_INSTALLED
        fi
    fi
    # Did we get a path for the pcp_node_info binary?
    if [ -z "$OCF_RESKEY_pcp_node_info_bin" ]; then
        ocf_log error "Missing required parameter \"pcp_node_info_bin\"."
        return $OCF_ERR_ARGS
    else
        if [ -x "$PCP_NODE_INFO" ]; then
            ocf_log error "The pcp_node_info binary is not executable or is not installed."
            return $OCF_ERR_INSTALLED
        fi
    fi
    if [ -n "$OCF_RESKEY_stop_mode" ]; then
        if ! echo "$OCF_RESKEY_stop_mode" | grep -E '[fsi]|fast|start|immediate' >> /dev/null; then 
            ocf_log error "Stop mode is invalid."
            return $OCF_ERR_ARGS
        fi
    else
        ocf_log error "Stop mode was not specified."
        return $OCF_ERR_ARGS
    fi
    if [ -n "$OCF_RESKEY_auto_reconnect" ] &&
       [ "$OCF_RESKEY_auto_reconnect" != "true" ] &&
       [ "$OCF_RESKEY_auto_reconnect" != "false" ]; then
        ocf_log error "Parameter 'auto_reconnect' must be empty, 'true', or 'false'."
        return $OCF_ERR_ARGS
    fi
    if [ -n "$OCF_RESKEY_fail_on_detached" ] &&
       [ "$OCF_RESKEY_fail_on_detached" != "true" ] &&
       [ "${OCF_RESKEY_fail_on_detached}" != "false" ]; then
        ocf_log error "Parameter 'fail_on_detached' must be empty, 'true', or 'false'."
        return $OCF_ERR_ARGS
    fi
    shopt -s extglob
    if [ -n "$OCF_RESKEY_fail_on_node_detached" ]; then
        TOKEN=${OCF_RESKEY_fail_on_node_detached%%,*}
        TOKEN_STRING=${OCF_RESKEY_fail_on_node_detached#*,}
        while [ -n $TOKEN ] && [ "${TOKEN}" != "${TOKEN_STRING}" ]; do
            case $TOKEN in
                [^0-9])
                    ocf_log error "Invalid token '${TOKEN}' in parameter 'fail_on_node_detached'."
                    return $OCF_ERR_ARGS
                ;;
            esac
            TOKEN=${TOKEN_STRING%%,*}
            if [ "${TOKEN_STRING}" == "${TOKEN_STRING#*,}" ]; then
                TOKEN_STRING=""
            else
                TOKEN_STRING=${TOKEN_STRING#*,}
            fi
        done
    fi
    return $OCF_SUCCESS
}

# These two actions must always succeed
case $__OCF_ACTION in
meta-data)      meta_data
                # OCF variables are not set when querying meta-data
                exit 0
                ;;
usage|help)     pgpool2_usage
                exit $OCF_SUCCESS
                ;;
esac

pgpool2_validate || exit $?

PGPOOL=$OCF_RESKEY_pgpool_bin
PCP_ATTACH_NODE=$OCF_RESKEY_pcp_attach_node_bin
PCP_DETACH_NODE=$OCF_RESKEY_pcp_detach_node_bin
PCP_NODE_COUNT=$OCF_RESKEY_pcp_node_count_bin
PCP_NODE_INFO=$OCF_RESKEY_pcp_node_info_bin
STOP_MODE=$OCF_RESKEY_stop_mode
echo "$(date)" >> /tmp/pgpool
case $__OCF_ACTION in
start)          pgpool2_start;;
stop)           pgpool2_stop;;
status|monitor) pgpool2_status;;
validate-all)   ;;
*)              pgpool2_usage
                exit $OCF_ERR_UNIMPLEMENTED
                ;;
esac
rc=$?
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc

Update the configuration so pgPool-II is clustered Change the parameters below to match your environment!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
crm configure primitive pgPool ocf:heartbeat:pgpool2 \
params pcp_admin_username=postgres \
params pcp_admin_password=postgres \
params pcp_admin_port=9898 \
params pcp_admin_host=localhost \
params pgpool_bin=/usr/sbin/pgpool \
params pcp_attach_node_bin=/usr/bin/pcp_attach_node \
params pcp_detach_node_bin=/usr/bin/pcp_detach_node \
params pcp_node_count_bin=/usr/bin/pcp_node_count \
params pcp_node_info=/usr/bin/pcp_node_info \
params stop_mode=f \
params auto_reconnect=t \
op monitor interval=1min

crm configure show
crm status

Configure Pacemaker to keep multiple resources together. In this case, pgPool-II can only run on the system with the ClusterIP.

1
2
3
crm configure colocation pgpool-with-ip INFINITY: pgPool ClusterIP
crm configure show
crm_mon

Configure Pacemaker to start the ClusterIP before pgPool and stop it after pgPool is stopped.

1
2
crm configure order pgPool-after-ip mandatory: ClusterIP pgPool
crm configure show

After waiting a few moments CoroSync should start up pgPool on one of the nodes.

1
crm_mon

References:

Installing pgPool-II on CentOS ClusterLabs Cluster From Scratch

Comments