Tech blog: April 2014

Thursday, April 24, 2014

hpacucli - Linux

This document is a quick cheat sheet on how to use the hpacucli utility to add, delete, identify and repair logical and physical disks on the Smart array 5i plus controller, the server that these commands were tested on was a HP DL380 G3 server with a Smart Array 5i plus controller with 6 x 72GB hot swappable disks, the server had Oracle Enterprise Linux (OEL) installed.
After a fresh install of Linux I downloaded the file hpacucli-8.50-6.0.noarch.rpm (5MB), you may want to download the latest version from HP. Then install using the standard rpm command.
I am not going to list all the commands but here are the most common ones I have used thus far, this document may be updated as I use the utility more.

Utility Keyword abbreviations
Abbreviations	chassisname = ch controller = ctrl logicaldrive = ld physicaldrive = pd drivewritecache = dwc
hpacucli utility
hpacucli	# hpacucli # hpacucli help Note: you can use the hpacucli command in a script
Controller Commands
Display (detailed)	hpacucli> ctrl all show config hpacucli> ctrl all show config detail
Status	hpacucli> ctrl all show status
Cache	hpacucli> ctrl slot=0 modify dwc=disable hpacucli> ctrl slot=0 modify dwc=enable
Rescan	hpacucli> rescan Note: detects newly added devices since the last rescan
Physical Drive Commands
Display (detailed)	hpacucli> ctrl slot=0 pd all show hpacucli> ctrl slot=0 pd 2:3 show detail Note: you can obtain the slot number by displaying the controller configuration (see above)
Status	hpacucli> ctrl slot=0 pd all show status hpacucli> ctrl slot=0 pd 2:3 show status
Erase	hpacucli> ctrl slot=0 pd 2:3 modify erase
Blink disk LED	hpacucli> ctrl slot=0 pd 2:3 modify led=on hpacucli> ctrl slot=0 pd 2:3 modify led=off
Logical Drive Commands
Display (detailed)	hpacucli> ctrl slot=0 ld all show [detail] hpacucli> ctrl slot=0 ld 4 show [detail]
Status	hpacucli> ctrl slot=0 ld all show status hpacucli> ctrl slot=0 ld 4 show status
Blink disk LED	hpacucli> ctrl slot=0 ld 4 modify led=on hpacucli> ctrl slot=0 ld 4 modify led=off
re-enabling failed drive	hpacucli> ctrl slot=0 ld 4 modify reenable forced
Create	# logical drive - one disk hpacucli> ctrl slot=0 create type=ld drives=1:12 raid=0 # logical drive - mirrored hpacucli> ctrl slot=0 create type=ld drives=1:13,1:14 size=300 raid=1 # logical drive - raid 5 hpacucli> ctrl slot=0 create type=ld drives=1:13,1:14,1:15,1:16,1:17 raid=5 Note: drives - specific drives, all drives or unassigned drives size - size of the logical drive in MB raid - type of raid 0, 1 , 1+0 and 5
Remove	hpacucli> ctrl slot=0 ld 4 delete
Expanding	hpacucli> ctrl slot=0 ld 4 add drives=2:3
Extending	hpacucli> ctrl slot=0 ld 4 modify size=500 forced
Spare	hpacucli> ctrl slot=0 array all add spares=1:5,1:7

SSSD : LDAP auth on Linux

Turning on LDAP authentication for Linux has changed. Significantly. Some serious advice to both old and new timers below.

Enabling LDAP authentication used to involve invoking nslcd, the Local LDAP Name Service Daemon. In my experience it worked, but was ugly. Red Hat and others still offer nslcd, but as Dik writes:

It’s broken, convoluted, and not well documented. Worst, there’s a lot of bad advice floating around the Internet in places like StackOverflow, ServerFault, ExpertsExchange, etc.

Of course you could say the same thing about a lot of open source (and most closed source) software as well. Documentation and clear instruction are sadly not the strong suit of many technologists, or the companies that hawk their wares. That’s why I especially appreciated this line from the above-referenced blogger, “Ignore it all. Just read this page. Ignore any piece of documentation that has you configuring nslcd.conf.” In fact the guy doesn’t stop there, he goes on to write:

Fedora/RedHat realized how terrible PADL software is, so they wrote their own stuff; it’s called SSSD. It’s a terrible name, but overall it works pretty well. Use SSSD, don’t use nslcd or anything that has pam_ldap or ldapd in the name. Just use SSSD.

Now that’s clarity.
So is, as it turns out, Red Hat’s latest documentation on the subject: in the Fedora 18 System Administrator’s Guide. An older, but fuller treatment can be found in the Fedora 15 Deployment Guide.

1. Install sssd and authconfig if they aren’t already. The packages you’ll want are:

sssd-client
sssd-common
sssd-common-pac
sssd-ldap
sssd-proxy
python-sssdconfig
authconfig
authconfig-gtk

The sssd package is a “meta” package that gets added by one or more of these others. My Fedora 19 installation from the Live DVD already had all these loaded.
2. Check the current settings for sssd, if any:

authconfig --test

This will show the settings already in place. Generally at this stage everything is disabled.
Check for an existing /etc/sssd/sssd.conf file. If this is a new installation where LDAP authentication has not been set up before the file will not exist, although the directory will.
3. Configure sssd.

authconfig \
--enablesssd \
--enablesssdauth \
--enablelocauthorize \
--enableldap \
--enableldapauth \
--ldapserver=ldap://ldap.example.com:389 \
--disableldaptls \
--ldapbasedn=dc=example,dc=com \
--enablerfc2307bis \
--enablemkhomedir \
--enablecachecreds \
--update

A few notes:
(a) It is extremely important to include “enablelocalauthorize”, which allows local account (/etc/passwd) values to override network (LDAP) values. This will allow you to log into the server if your LDAP directory goes down.
(b) The LDAP uri you enter will depend upon whether you’re going to be connecting over unencrypted LDAP, SSL LDAPS or LDAP TLS. Password changes may not work using unencrypted LDAP, and it’s eventually going to be deprecated. LDAPS is also supposed to be deprecated in a future release (there are concerns about the efficacy of its security). As a result, using TLS is highly recommended.
(c) Even if you’re going to use LDAP TLS, use the disableldaptls option on initial setup to avoid an abend due to a failure to provide a certificate url. I prefer to keep my certificates in a better place that the default /etc/openldap/certs (my preference is for the common system /etc/pki/tls/certs).
(d) If you want to use LDAP groups be sure to include enablerfc2307bis, this is the schema variant that recognizes uniquember as the attribute for storing group member dns.
(e) Both enablemkhomedir and enablecachecreds are not required, but recommended. The former because it saves administrators a maintenance step by letting the system create the user’s home directory on their first login, and the second because it helps avoid the consequences of periodic network connectivity “hiccups”.
(f) If you use the authconfig tool you should not have to edit any other files such as /etc/nsswitch.conf, authconfig will have done that for you (see below for one exception). The sssd service should also be up and running with that new configuration and enabled for restart on reboot.
4. Check the configuration in /etc/sssd/sssd.conf. In particular you’ll need to edit it so that the ldap_tls_cacertdir and ldap_tls_cacert parameters have valid (real) paths to your certificates. If you’re going to use TLS (which you really should if your LDAP directory supports it — most, including OpenDJ, do), change “ldap_id_use_start_tls” to “True”.

[domain/default]

autofs_provider = ldap
ldap_schema = rfc2307bis
krb5_realm = #
ldap_search_base = dc=example,dc=com
id_provider = ldap
auth_provider = ldap
chpass_provider = ldap
ldap_uri = ldap://ldap.example.com:389
ldap_id_use_start_tls = True
cache_credentials = True
ldap_tls_cacertdir = /etc/pki/tls/certs
ldap_tls_cacert = /etc/pki/tls/certs/mybundle.pem
[sssd]
services = nss, pam, autofs
config_file_version = 2

domains = default
[nss]

[pam]

[sudo]

[autofs]

[ssh]

[pac]

Restart sssd to effect these changes:

systemctl restart sssd

DO NOT use the update option with authconfig until you’ve restarted the service, otherwise you’ll wipe out any changes you’ve made to the configuration file.
Then run a check to make sure they’ve been read in correctly:

authconfig --test

5. Update /etc/openldap/ldap.conf to follow the same configuration. It should look something like this when you’re done:

SASL_NOCANON    on
URI ldaps://ldap.example.com:389
BASE dc=arrow,dc=com
TLS_REQUIRE never
TLS_CACERTDIR /etc/pki/tls/cacerts
TLS_CACERT /etc/pki/tls/certs/mybundle.pem

That “TLS_REQUIRE never” is for the benefit of application stacks like php that leverage the system’s LDAP but have difficulty with LDAPS and TLS, even when dealing with certs signed by an external authority.
6. Make sure that sssd is up and running, as well as enabled to restart when the system reboots next. Use “systemctl status sssd” to check this. If it isn’t use “systemctl enable sssd” and “systemctl start sssd”.

Sometimes it is best to restart the service in order to ensure that the cache is cleared out and all changes applied. Use “systemctl restart sssd” for this.

############SECOND METHOD################

2. Kickstart and build stuff aside, the biggest problem we had with building some new CentOS 6 test boxes had to do with LDAP. You see, RedHat (and CentOS as a result) now supports 2 different providers for LDAP authentication. That's right, two. The bad thing is that it's 2 *new* providers. It's not the "new way" and the "old way." It's the "new way" and the "other new way." Those looking for seamless upgrades, keep wishing. Those who want to figure out how to do this easily, read on.

Basically, the old PADL NSS stuff is dead. They realized what a steaming pile of shit it was (memory leaks and all) and decided to scrap it. So they took a lot of the same stuff, renamed it, and pushed it out the door. I'll call this the "nslcd/openldap/legacy stuff." This is the closest method to "the old way" of doing things. But here's the catch, they fucked it all up. It's broken, convoluted, and not well documented. Worst, there's a lot of bad advice floating around the Internet in places like StackOverflow, ServerFault, ExpertsExchange, etc. Ignore it all. Just read this page. Ignore any piece of documentation that has you configuring nslcd.conf.

Fedora/RedHat realized how terrible PADL software is, so they wrote their own stuff; it's called SSSD. It's a terrible name, but overall it works pretty well. Use SSSD, don't use nslcd or anything that has pam_ldap or ldapd in the name. Just use SSSD. Update: This is the page that I used to learn about/configure sssd.

Here's the idiot's guide, super easy configuration:

yum install sssd
authconfig --enablesssd --enablesssdauth --enablelocauthorize --update
Edit /etc/sssd/sssd.conf to look similar to this (I'm not going through each item -- RTFM instead):
[sssd]
config_file_version = 2
services = nss, pam
domains = default

[nss]
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd

[pam]

[domain/default]
ldap_tls_reqcert = never
auth_provider = ldap
ldap_schema = rfc2307bis
krb5_realm = EXAMPLE.COM
ldap_search_base = dc=domain,dc=com
ldap_group_member = uniquemember
id_provider = ldap
ldap_id_use_start_tls = False
chpass_provider = ldap
ldap_uri = ldaps://ldapserver1/,ldaps://ldapserver2/
ldap_chpass_uri = ldaps://your.ldapwrite.server/
krb5_kdcip = kerberos.example.com
cache_credentials = True
ldap_tls_cacertdir = /etc/openldap/cacerts
entry_cache_timeout = 600
ldap_network_timeout = 3
ldap_access_filter = (&(object)(object))
Change the passwd, shadow, and group sections of /etc/nsswitch.conf to be "files sss". Do not use "files ldap". If you choose "files ldap", you'll tell the system to use the shitty PADL nslcd crap. Don't do that!
service sssd restart
After that, you should be able to type "id $user" and get something back from LDAP. You can make sure it's using the right LDAP servers by checking netstat (netstat -anp | grep sssd_be).
That's it. Don't mess with nslcd.conf. Don't install any nss-pam-ldapd packages or ldapd or anything. Just don't do it. Use the RedHat/Fedora stuff and tell PADL to kiss your ass.

Setting up autofs, sudo, etc to use LDAP is almost exactly like it was in CentOS 5. For example, you do want to add "ldap" to nsswitch.conf for autofs. My one recommendation would be to ditch the RH/CentOS sudo packages and install one of the RPMs from the sudo page. You'll be on the mainline versions *and* you'll avoid the stupid /etc/ldap.conf /etc/nslcd.conf crap that RedHat ran into in their version of sudo. In short, they updated the sudo package to look for configuration information in /etc/nslcd.conf, but the nslcd binary won't start if it sees directives it doesn't understand in its conf file. Basically, if you use the "old PADL LDAP nslcd" crappy way of LDAP auth, you can't use sudo. So don't use it. Stick with the basic SSSD stuff and get a sudo RPM from the sudo.ws page that looks for information in /etc/ldap.conf.

Oh and if you use nscd with sssd, be sure and set the passwd and group caches to "no". It's good to run nscd as a DNS host name cache, but its user and group caching conflicts with sssd's (which does its own).

5 ways to improve HDD speed on Linux

(If you still think this post is about making windows load faster, then press ALT+F4 to continue)

Our dear Mr. Client

Its a fine sunny sunday morning. Due tomorrow, is your presentation to a client on improving disk-I/O. You pull yourself up by your boots and manage to climb out of bed and onto your favorite chair...

I think you forgot to wear your glasses...

Aahh thats better... You jump into the couch and turn on your laptop and launch (your favorite presentation app here). As you sip your morning coffee and wait for the app to load, you look out of the window and wonder what it could be doing.

Looks simple, right? Then why is it taking so long?

If you would be so kind enough to wipe your rose-coloured glasses clean, you would see that this is what is ACTUALLY happening

0. The app (running in RAM) decides that it wants to play spin-the-wheel with your hard-disk.

1. It initiates a disk-I/O request to read some data from the HDD to RAM(userspace).

2. The kernel does a quick check in its page-cache(again in RAM) to see if it has this data from any earlier request. Since you just switched-on your computer,...

3. ...the kernel did NOT find the requested data in the page-cache. "Sigh!" it says and starts its bike and begins it journey all the way to HDD-land. On its way, the kernel decides to call-up its old friend "HDD-cache" and tells him that he will be arriving shortly to collect a package of data from HDD-land. HDD-cache, the good-friend as always, tells the kernel not to worry and that everything will be ready by the time he arrives in HDD-land.

4. HDD-cache starts spinning the HDD-disk...

5. ...and locates and collects the data.

6. The kernel reaches HDD-land and picks-up the data package and starts back.

7. Once back home, it saves a copy of the package in its cache in case the app asks for it again. (Poor kernel has NO way of knowing that the app has no such plans).

8. The kernel gives the package of data to the app...

9. ...which promptly stores it in RAM(userspace).

Do keep in mind that this is how it works in case of extremely disciplined, well-behaved apps. At this point misbehaving apps tend to go -

"Yo kernel, ACTUALLY, i didn't allocate any RAM, i wanted to just see if the file existed. Now that i know it does, can you please send me this other file from some other corner of HDD-land."

...and the story continues...

Time to go refill your coffee-cup. Go go go...

Hmmm... you are back with some donuts too. Nice caching!

So as you sit there having coffee and donuts, you wonder how does one really improve the disk-I/O performance. Improving performance can mean different things to different people:

Apps should NOT slow down waiting for data from disk.
One disk-I/O-heavy app should NOT slow down another app's disk-I/O.
Heavy disk-I/O should NOT cause increased cpu-usage.
(Enter your client's requirement here)

So when the disk-I/O throughput is PATHETIC, what does one do?...

5 WAYS to optimise your HDD throughput!

1. Bypass page-cache for "read-once" data.

What exactly does page-cache do? It caches recently accessed pages from the HDD. Thus reducing seek-times for subsequent accesses to the same data. The key here being subsequent. The page-cache does NOT improve the performance the first time a page is accessed from the HDD. So if an app is going to read a file once and just once, then bypassing the page-cache is the better way to go. This is possible by using the O_DIRECT flag. This means that the kernel does NOT considered this particular data for the page-cache. Reducing cache-contention means that other pages (which wouold be accessed repeatedly) have a better chance of being retained in the page-cache. This improves the cache-hit ratio i.e better performance.




void ioReadOnceFile()

{

/*  Using direct_fd and direct_f bypasses kernel page-cache.


 *  - direct_fd is a low-level file descriptor

 *  - direct_f is a filestream similar to one returned by fopen()

 *  NOTE: Use getpagesize() for determining optimal sized buffers.

 */


int direct_fd = open("filename", O_DIRECT | O_RDWR);


FILE *direct_f = fdopen(direct_fd, "w+");


/* direct disk-I/O done HERE*/



fclose(f);


close(fd);

}

2. Bypass page-cache for large files.

Consider the case of a reading in a large file (ex: a database) made of a huge number of pages. Every subsequent page accessed get into the page-cache only to be dropped out later as more and more pages are read. This severely reduces the cache-hit ratio. In this case the page-cache does NOT provide any performance gains. Hence one would be better off bypassing the page-cache when accessing large files.


void ioLargeFile()

{

/*  Using direct_fd and direct_f bypasses kernel page-cache.


 *  - direct_fd is a low-level file descriptor

 *  - direct_f is a filestream similar to one returned by fopen()

 *  NOTE: Use getpagesize() for determining optimal sized buffers.

 */


int direct_fd = open("largefile.bin", O_DIRECT | O_RDWR | O_LARGEFILE);


FILE *direct_f = fdopen(direct_fd, "w+");


/* direct disk-I/O done HERE*/



fclose(f);


close(fd);

}

3. If (cpu-bound) then scheduler == no-op;

The io-scheduler optimises the order of I/O operations to be queued on to the HDD. As seek-time is the heaviest penalty on a HDD, most I/O schedulers attempt to minimise the seek-time. This is implemented as a variant of the elevator algorithm i.e. re-ordering the randomly ordered requests from numerous processes to the order in which the data is present on the HDD. require a significant amount of CPU-time.

Certain tasks that involve complex operations tend to be limited by how fast the cpu can process vast amounts of data. A complex I/O-scheduler running in the background can be consuming precious CPU cycles, thereby reducing the system performance. In this case, switching to a simpler algorithm like no-op reduces the CPU load and can improve system performance.

echo noop > /sys/block/<block-dev>/queue/scheduler

4. Block-size: Bigger is Better

Q. How will you move Mount Fuji to bangalore?
Ans. Bit by bit.

While this will eventually get the job done, its definitely NOT the most optimal way. From the kernel's perspective, the most optimal size for I/O requests is the the filesystem blocksize (i.e the page-size). As all I/O in the filesystem (and the kernel page-cache) is in terms of pages, it makes sense for the app to do transfers in multiples of pages-size too. Also with multi-segmented caches making their way into HDDs now, one would hugely benefit by doing I/O in multiples of block-size.

Barracuda 1TB HDD : Optimal I/O block size 2M (=4blocks)

The following command can be used to determine the optimal block-size
stat --printf="bs=%s optimal-bs=%S\n" --file-system /dev/<block-dev>

5. SYNC vs. ASYNC (& read vs. write)

ASYNC I/O i.e. non-blocking mode is effectively faster with cache

When an app initiates a SYNC I/O read, the kernel queues a read operation for the data and returns only after the entire block of requested data is read back. During this period, the Kernel will mark the app's process as blocked for I/O. Other processes can utilise the CPU, resulting in a overall better performance for the system.

When an app initiates a SYNC I/O write, the kernel queues a write operation for the data puts the app's process in a blocked I/O. Unfortunately what this means is that the current app's process is blocked and cannot do any other processing (or I/O for that matter) until this write operation completes.

When an app initiates an ASYNC I/O read, the read() function usually returns after reading a subset of the large block of data. The app needs to repeatedly call read() with the size of data remaining to be read, until the entire required data is read-in. Each additional call to read introduces some overhead as it introduces a context-switch between the userspace and the kernel. Implementing a tight loop to repeatedly call read() wastes CPU cycles that other processes could have used. Hence one usually implements blocking using select() until the next read() returns non-zero bytes read-in. i.e the ASYNC is made to block just like the SYNC read does.

When an app initiates an ASYNC I/O write, the kernel updates the corresponding pages in the page-cache and marks them dirty. Then the control quickly returns to the app which can continue to run. The data is flushed to HDD later at a more optimal time(low cpu-load) in a more optimal way(sequentially bunched writes).

Hence, SYNC-reads and ASYNC-writes are generally a good way to go as they allow the kernel to optimise the order and timing of the underlying I/O requests.

There you go. I bet you now have quite a lot of things to say in your presentation about improving disk-IO. ;-)

PS: If your client fails to comprehend all this (just like when he saw inception for the first time), then do not despair. Ask him to go buy a freaking-fast SSD and he will never bother you again.

My Pages