User Tools

Site Tools


support:systems:cms

T3_US_UCD Documentation

T3_US_UCD is a beowulf cluster owned by John Conway and used by CERN researchers on the CMS project. This page documents the hardware, software, and policies surrounding this resource. The announcement archives are available online.

Announcements

Announcement notifications are currently sent through a proxy (John Conway). We hope to set up a mailing list to make this process more transparent.

Terminology

The CMS terminology and technology supporting the CMS project is often hard to navigate. We maintain some resources on this very topic. We have a terminology page that should provide some help.

Monitoring

cms cpu load

Ganglia

You can also monitor the cluster using the ganglia interface. With the ganglia interface you can monitor just the nodes associated with your jobs. Just click on the “Job Queue” link and look for your job id. We collect a lot of data and you can look at things like memory usage, load average, disk activity, network activity, etc.

RSV

There are also RSV probes running occasionally that will test the various pieces of the OSG infrastructure that is running here at UC Davis. These probes are reported to the OSG servers in Indiana. You can see the status page in MyOSG.

GIP

The Generic Information Provider (GIP) provides information on what resources are installed and where they are located on the internet. This information is configured on the primary node (cms.tier3) and is periodically sent to OSG, then on to WLCG. If this process stops or the connection becomes broken the WLCG will no know where our resources are and CRAB jobs will begin to fail. The GIP status can be checked on the MyOSG GIP status page for UCD here.

PhEDEx

PhEDEx is a collection of perl scripts that transfer data from other sites. It is running on agent.tier3. You can see the status of any pending downloads for Prod, Dev, and Debug. Typically only load tests should be active in the debug instance. You can also see historical information on the cmsweb phedex site.

SAM

CERN runs the Site Availability Monitoring (SAM) tests. These tests will get information from GIP/BDII and occasionally test the sites availability. This means it will submit real jobs and transfer real data. I shouldn't add too much load but it is a real-life remote test. You can see the status of the Compute Element tests and Storage Element tests for the OSG sites. We are “ucd”.

FronTier/Squid

Squid provides a caching layer for some non-local data. UCD data can be viewed here.

Gratia Accounting

Gratia provides information on which users and VOs are using your resources. This can be useful in determining why a resource is so busy. You can get useful information from the Gratia page for UCD in MyOSG.

Policies

The policies surrounding using this resource are outlined on this page.

Software

Any software that is available in Scientific Linux CERN 5 is also available for installation or already installed on this cluster. This provides the majority of the software that is installed. Any custom packages including compilers, mpi layers, open source packages, commercial packages, etc are installed in /share/apps and are available to all nodes in the cluster. We have set up a page that has details on the current software configuration.

OS, Provisioning, Configuration Management

This cluster currently runs Scientific Linux CERN v5 (SLC5). SLC5 is based on Scientific Linux which is based on Redhat Enterprise Linux. CERN maintains SLC5. For provisioning we use cobbler and for configuration management we use puppet.

Hardware

The hardware is made up of the following:

  • 40 quad-core, dual socket AMD compute nodes with 12GB RAM and 2x2TB of disk
  • 4 24 core AMD Opteron compute nodes with 64 GB RAM and 3x4TB of disk
  • 4 storage nodes
  • 1 KVM consoles
  • 2 APC 48u racks
  • 2 APC UPSes
support/systems/cms.txt · Last modified: 2014/04/05 10:45 by tlknight