I recently stood up a CFengine 3 configuration management infrastructure
and took notes during the process to share with my team. This was my
first attempt at using CFengine, so hopefully this multi-part overview
will help others trying to bootstrap their environments as well. Many of
these notes were taken from the CFengine 3 reference manual and tutorial
found on the docs website
here. There is some
excellent documentation on the CFengine.org so if you have more
questions about something specific, be sure to check out the reference
manuals!
Neil Watson has also compiled an excellent tutorial on his CFengine 3
setup. I
organized some of the structure of my config files from his examples.
There is also theCFengine help mailing
list. You can browse
thearchives through the web
here. Some of the details in the
following documentation (building software, SMF scripts) may be Solaris
10 specific as that was the platform I was working with.
High Level Architecture and Objectives
What are some examples of what CFEngine can do?
- Performing post-installation tasks such as configuring the network interface.
- Editing system configuration files and other files.
- Creating symbolic links.
- Checking and correcting file permissions and ownership.
- Deleting unwanted files.
- Compressing selected files.
- Distributing files within a network.
- Automatically mounting NFS file systems.
- Verifying the presence and integrity of important files and file systems.
- Executing commands and scripts.
- Applying security-related patches and similar system corrections.
- Managing system server processes.
- Makes sandwiches via sudo.
Fundamental concepts, rules, and terms CFEngine uses.
- Host: Generally, a host is a single computer that runs an operating
system like UNIX, Linux, or Windows. We will sometimes talk about
machines too, and a host can also be a virtual machine supported by an
environment such as VMware or Xen/Linux.
- Policy: This is a specification of what we want a host to be like.
Rather than be in any sort of computer program, a policy is essentially
a piece of documentation that describes technical details and
characteristics. Cfengine implements policies that are specified via
directives.
- Configuration: The configuration of a host is the actual state of
its resources
- Operation: A unit of change is called an operation. CFEngine deals
with changes to a system, and operations are embedded into the basic
sentences of a cfengine policy. They tell us how policy constrains a
host — in other words, how we will prevent a host from running away.
- Convergence: An operation is convergent if it always brings the
configuration of a host closer to its ideal state and has no effect if
the host is already in that state.
- Classes: A class is a way of slicing up and mapping out the complex
environment of one or more hosts in to regions that can then be referred
to by a symbol or name. They describe scope: where something is to be
constrained.
- Autonomy: No cfengine component is capable of receiving information
that it has not explicitly asked for itself.
- Scalable distributed action: Each host is responsible for carrying
out checks and maintenance on/for itself, based on its local copy of
policy.
- The fact that each cfengine agent keeps a local copy of policy
(regardless of whether it was written locally or inherited from a
central authority) means that cfengine will continue to function even if
network communications are down.
Critical CFEngine Daemons and Commands
- cf-agent: Interprets policy promises and implements them in a
convergent manner. The agent fetches data from cf-servd running on the
Master Policy Servers.
- cf-execd: Executes cf-agent and logs its output (optionally sending
a summary via email). It can be run in daemon (standalone) mode. We have
configured Solaris’ SMF to keep cf-execd online, which drives cf-agent.
- cf-serverd: Monitors the cfengine port: serves file data to
cf-agent. Every bit of data that we transfer between cf-agent and
cf-serverd is encrypted.
- cf-monitord: Collects statistics about resource usage on each host
for anomaly detection purposes. The information is made available to the
agent in the form of cfengine classes so that the agent can check for
and respond to anomalies dynamically.
- cf-key: Generates public-private key pairs on a host. You normally
run this program only once, as part of the cfengine software
installation process.
On a client system, cf-agent will be executed automatically by the
cf-execd daemon; the latter also handles logging during cf-agent runs.
In addition, operations such as file copying between hosts are initiated
by cf-agent on the local system, and they rely on the cf-serverd daemon
on the Master Policy Server to obtain remote data.**
High Level Architecture of pushing configurations
SVN becomes the source of truth for CFEngine. The Architecture we are using will allow us to start with only one “Master Policy Server” or “Distribution Server” per site, but we can easily scale to multiple machines if wanted.
- A cron entry on the Master Policy Server will check the SVN repository at svn:/// every minute. If a updated configuration is detected, it will download the client configurations into /var/cfengine/masterfiles on the Master Policy Server.
- Depending upon the value configured for “splaytime” on the clients, they will check in randomly over a given period of, say, 10 minutes. The new policy file that was downloaded to /var/cfengine/masterfiles will be served by cf-serverd on the Master Policy Server and transferred (with encryption) to the client by the cf-agent command and pulled into /var/cfengine/inputs.
- The client runs the cf-execd daemon through SMF. The cf-execd daemon peridoically wakes up to execute cf-agent which runs the policies in /var/cfengine/inputs. If a new policy was transferred to the client, cf-agent will execute it.
The data flow on performing a change is as follows:
Pushing Configuration Changes**
- I make a config change on my local machine and push to SVN. push
—-> SVN
- Updated configuration detected. Download changes via cron script
into /var/cfengine/masterfiles on policy server <—- pull from SVN
- Policy Server running cf-serverd now has updated configurations in
/var/cfengine/masterfiles to push to clients <—— pull from SVN from
cron script
- Clients running cf-execd daemon execute cf-agent based upon schedule
(by default every 5 minutes)
- cf-agent looks at configured “splaytime” variable to figure out how
long to wait before contacting cf-serverd. (compute hash and randomly
check in over interval) This random “back off” time keeps the master
policy server from being hammered all at once by thousands of clients.
If we randomly check in over a 10 minute interval, then we have less
bursts of network i/o, etc…
**6. cf-agent contacts cf-serverd running on Master Policy Server(s)
and pulls updated policies / configs / etc via encrypted link. This
happens via execution of failsafe.cf and update.cf <—— pull from
Master Policy Servers. Clients pull. Servers don’t “push”.
Changes are done on the client opportunistically. If the network is
down, nothing happens on the clients. The next time the client can
contact the Master Policy Server, the change is executed.
- cf-agent executes policies via promises.cf. Changes happen on the
client here.**
- cf-execd records details of the execution of promises.cf and records
what happened into /var/cfengine/outputs.
- cf-monitord records behavior of the machine and records details in
/var/cfengine/reports
- cf-execd kept running / monitored by Solaris SMF on client.
- cf-monitord kept running / monitored by Solaris SMF on client.
- cf-report ran manually through the CLI. cf-report analyzes data
collected by cf-monitord in /var/cfengine/reports. Outputs to html /
text / XML / etc…
- Predefined schedule of XXX minutes passes again and cf-execd
executes cf-agent again. Repeat from step 4.
Why does everything reside in /var/cfengine? How is CFengine resilient to failures?
Cfengine likes to keep itself as resilient as possible. Some environments have /usr/local NFS mounted, so /var/cfengine was chosen as it was pretty much guaranteed to be kept locally on disk.
- Binaries that get executed reside in /var/cfengine/bin. Pristine copies of binaries reside in /var/cfengine/sbin. Every time cf-agent executes failsafe.cf (which calls update.cf), it verifies that the MD5 digest of the binaries in /var/cfengine/bin match /var/cfengine/sbin. If they don’t match, permissions have changed, ownership, etc. then they will automatically be copied from /var/cfengine/sbin to /var/cfengine/bin. This is a fail safe protection mechanism that will attempt to have CFEngine automatically recover itself from some sort of corruption.
- If you look at the “Part 2 — How I compiled CFEngine” page, you’ll see that we manually changed some configurations in the Makefile. This was to ensure that libpcre, libgcc.so.1, and libcrypto.a were statically compiled into the CFEngine client binaries. We dont want to have CFEngine rely on software under /usr/sfw/lib or /usr/local/lib – its completely self contained in /var/cfengine (other than general system libraries.)
- cf-agent actually gets executed twice on each run. The first run is to update all policy files via execution of failsafe.cf from the master policy server, but not to actually execute the policies. The second run executes promises.cf and really performs the changes. We modify promises.cf. We never modify failsafe.cf or update.cf once in production.
- This allows us to have syntax errors in promises.cf, but allow the clients to recover themselves in an automated fashion. If promises.cf is corrupt, we can’t actually execute policies. But if failsafe.cf and update.cf are in a good state, the clients will continue to poll the master policy server for updated copies of files.
- We can correct promises.cf from our syntax error — clients will pull the updated and corrected promises.cf, and the auto-recovery process of the configs is complete.
- If you break failsafe.cf or update.cf on the clients, then the clients will have to be touched manually to recover. Don’t modify these configurations once in a production environment — or be extremely careful to test your changes if you absolutely must.
This article was posted by Matty on 2010-07-02 10:54:00 -0400 -0400