How to Enable Crypto Acceleration on the BeagleBone Black

This HOWTO describes the process of enabling acceleration for certain cryptographic algorithms on the BeagleBone Black(BBB).  A week ago, I tried and failed due to all sorts of kernel modules problems, but it now appears I have everything in order.  Specifically, I will detail how to configure OpenSSL to use the BBB crypto hardware.  Update 3/22/14: In the 3.13 kernel, the OMAP TI crypto drivers are enabled by default (for the BBB images).

Rack 'em and Stack 'em: Stacked BBBs.  (Photo op configuration only b/c it's probably not the best for heat dissipation...)

Rack ‘em and Stack ‘em: Stacked BBBs. (Photo op configuration only b/c it’s probably not the best for heat dissipation…)

Instructions:

  1. Download and flash the Debian eMMC flasher image.
  2. Do a apt-get update and apt-get install build-essentials to get a toolchain on the BBB.
  3. Download the cryptodev-linux-1.6.tar.gz device source.  This allows user-space applications access to the hardware accelerators.
  4. Download the linux kernel headers provided by Robert Nelson.  First run a uname -a on the BBB to see what version of Debian you have.  I was running v3.8.13-bone26 so that’s the folder to which you should navigate.  You’ll want to download the linux-headers.deb for your version. If you have v3.8.13-bone26, you file is here.
  5. Run sudo dpkg -i linux-headers-3.8.13-bone26_1.0wheezy_armhf.deb.
  6. There is a slight problem with one of the headers.  Basically, RNelson’s deb doesn’t install all the headers because he was trying to save on precious space for the BBB.  So, you need to make one tweak: (Thankfully, I stumbled on this post which gave me this idea!)
    sudo nano /usr/src/linux-headers-3.8.13-bone26/arch/arm/include/asm/timex.h
    Remove / comment out the line: #include <mach/timex.h> and replace it with:
    #include <usr/src/linux-headers-3.8.13-bone26/arch/arm/include/asm/timex.h>
  7. tar zxf cryptodev-linux-1.6.tar.gz and cd into that directory and do a make and sudo make install.
  8. sudo depmod -a to register your module.
  9. sudo modprobe cryptodev to insert it.
  10. lsmod and you should see cryptodev in the list!
  11. Edit /etc/modules and put cryptodev on a line by itself at the end of the file (this will make sure the module inserts on boot).
  12. Ok, we are done with the module, so go back and download OpenSSL (the starred version) and tar zxf openssl* and cd into that directory.  There is a patch from TI for OpenSSL that their instructions say to install.  But that patch was a year old, so I’m not sure if that’s current.  I did not install it.
  13. run ./config -DHAVE_CRYPTODEV -DUSE_CRYPTDEV_DIGESTS shared
  14. make (this takes a long time)
  15. sudo make install. One thing to note, this will install openssl in /usr/local/ssl/bin which will not be first in your path to /usr/bin/openssl.  So you should either change the default install directory or update symlinks as appropriate.
  16. Enjoy!

Future Work

  1. Package this up into a deb for easy install?
  2. Update my tor relay and measure the performance gain.
  3. Work on enabling the hardware random number.  UPDATE: This is now enabled with kernel version 3.13.

Without cryptodev

debian@arm:~/openssl-1.0.1e/cryptodev-linux-1.6$ time openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 2666405 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 905987 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 240811 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 1024 size blocks: 61145 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 7677 aes-128-cbc's in 3.00s
OpenSSL 1.0.1e 11 Feb 2013
built on: Mon Mar 18 21:48:12 UTC 2013
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: gcc -fPIC -DOPENSSL<em>PIC -DZLIB -DOPENSSL</em>THREADS -D<em>REENTRANT -DDSO</em>DLFCN -DHAVE<em>DLFCN</em>H -DL<em>ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D</em>FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 14268.39k 19327.72k 20617.93k 20870.83k 20963.33k

real 0m15.114s
user 0m15.031s
sys 0m0.041s

With cryptodev

debian@arm:/usr/local/ssl/bin$ time /usr/local/ssl/bin/openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 28166 aes-128-cbc's in 0.04s
Doing aes-128-cbc for 3s on 64 size blocks: 22445 aes-128-cbc's in 0.03s
Doing aes-128-cbc for 3s on 256 size blocks: 29933 aes-128-cbc's in 0.05s
Doing aes-128-cbc for 3s on 1024 size blocks: 16018 aes-128-cbc's in 0.04s
Doing aes-128-cbc for 3s on 8192 size blocks: 4861 aes-128-cbc's in 0.02s
OpenSSL 1.0.1e 11 Feb 2013
built on: Fri Oct 4 01:48:18 UTC 2013
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DOPENSSL<em>THREADS -D</em>REENTRANT -DDSO<em>DLFCN -DHAVE</em>DLFCN<em>H -DHAVE</em>CRYPTODEV -DUSE<em>CRYPTDEV</em>DIGESTS -march=armv7-a -Wa,--noexecstack -DTERMIO -O3 -Wall -DOPENSSL<em>BN</em>ASM<em>MONT -DOPENSSL</em>BN<em>ASM</em>GF2m -DSHA1<em>ASM -DSHA256</em>ASM -DSHA512<em>ASM -DAES</em>ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 11266.40k 47882.67k 153256.96k 410060.80k 1991065.60k

real 0m15.326s
user 0m0.225s
sys 0m5.990s

44 thoughts on “How to Enable Crypto Acceleration on the BeagleBone Black

  1. Great article, thanks. I’m going to use my BBB for OpenVPN, so I’m seriously thinking about doing this, but I’m confused about something:

    Without cryptodev: “Doing aes-128-cbc for 3s on 16 size blocks: 2666405 aes-128-cbc’s in 2.99s” With cryptodev: “Doing aes-128-cbc for 3s on 16 size blocks: 28166 aes-128-cbc’s in 0.04s”

    Does that mean that only about about 1/10 as many units were completed in that 3 seconds with cryptodev? I.e. 28166 vs. 2666405?

    Also: “Ok, we are done with the module, so go back and tar zxf openssl* and cd into that directory.”. I didn’t see a step where we downloaded openssl tarballs. Could you please elaborate? Thanks!

  2. 1.) That is a great question. I will have to do some research and get back to you on this b/c I’m not sure. I interpret these results as it’s using less CPU with cryptodev, but now I’m not sure how to read the speed. Thanks for looking at this closely.

    2.) Ah, thanks. I need to fix that. This is where I downloaded OpenSSL from : https://www.openssl.org/source/ . Pick “the latest” (in red) and then follow the directions above to configure and make.

    Be sure to capture CPU performance before and after you upgrade with OpenVPN. I’m very interested in how much it helps you!

      • Ok, I think I figured it out. This post helped. The hardware appears to be optimized for larger block sizes. On 8192 blocks, cryptodev screams: 1991065kB/s vs. 20963kB/s.

        Look at the 8192 line: without cryptodev did only 7677 blocks in 3 seconds while cryptodev did: 4861 in .02 seconds. So, cryptodev finished in \frac{1}{150} th of the time ( \frac{3}{.02} = 150). Extrapolating, cryptodev could perform 729150 rounds of 8192 blocks (94 times faster!).

        My guess is that the overhead of DMAing or whatever over to the crypto hardware is inefficient for small block sizes. For most SSL-like applications, I think the data will be using the large block sizes and be more efficient.

        Do you agree?

      • The .02 seconds presumably represents the CPU time, which with cryptodev enabled, should indeed be almost nothing. I’m more interested in real, elapsed time. The “aes-128-cbc 11266.40k 47882.67k 153256.96k 410060.80k 1991065.60k” result looks really encouraging, as you say, for the larger block sizes. I’m most interested in using this for OpenVPN and HTTPS, both of which would use larger block sized, I would guess.

        I’ve followed your instructions and installed this on my BBB, ran the two benchmarks, and got results very similar to yours. I’m also running some other tests – doing AES128, AES256, SHA1, and SHA256 on a 5 GB file. That will take awhile to run, and will not be quite so accurate because of SD card access overhead, but it should be interesting. I’ll probably post the results on my Tiny Computers blog. http://www.bobjectsinc.com/tinycomputers/

      • Here’s an update: With my 5.2 GB file, if I do an aes-128 encryption on it with the old, non-accellerated openssl, it finishes in about 16 minutes. If I attempt that with the new one, it crashes my BBB hard, to where I can’t even ping it. My guess is that it tries to send the entire dataset to the cryptodev device, and that won’t scale. I could be wrong on that, but now I’m scared that this may pose a reliability problem, so I think I won’t be cutting over to the accellerated openssl any time soon. Fun Sunday afternoon project, though.

  3. Pingback: Spreading the good word about Tor and the BeagleBone Black | fortune datko

  4. Pingback: CryptoCape: A new BeagleBone Black Cape Concept | fortune datko

  5. Hey! Thanks for the great article. I’m running ArchLinux on my BBB and made PKGBUILDs (sort of compile and install scripts) for both cryptodev-linux and openssl with cryptodev enabled with the help of your instructions. Both compile and install just fine and i can load the module but it seems that it’s just not working. I did an openssl speed test before and after for md5, sha and aes but sadly they don’t differ at all. lsmod shows cryptodev and it’s Used by is increased by 1 when openssl is at work. My kernel is compiled with the right options set: CONFIG_CRYPTO_DEV_OMAP_AES=y and CONFIG_CRYPTO_DEV_OMAP_SHAM=y and the kernel headers are those of the installed kernel.. I’d really apreciate it if you could point me to the right direction. Really want to see this work. Thanks!

    • Cool! Ok, some things to check:

      - Are you using the version of OpenSSL that you compiled for hardware acceleration? Mine installed in a different location and when I first tried it (using the default openssl), it did not work. I actually did a apt-get remove openssl just to be sure I was using the right one.
      - Are you using 1.0.1e?
      - If you do “openssl engine” do you see “cryptodev”? If not, openssl does not recognized the module and it won’t use the HW acceleration.
      - When configuring openssl, I forgot to add “shared” to the configure line. Sorry. I will fix that now. Otherwise applications that want to use OpenSSL will have to be configured for static linkage.

  6. Pingback: Tales from the Crypt-o: Update on BBB Crypto Hardware Trials | fortune datko

  7. This doesn’t enable the crypto. All you did was took the processing from user time to kernel time.

    It does add the cryptodev module which takes the processing to the kernel but it does not activate the crypto device in the hardware. I finally (today) got the hwrng running (had to patch about 4 files in the dts and omap directories of the kernel and rebuild it). Beaglebone still has no real crypto hardware support, though. It’s all there, just not enabled yet with any openly available linux drivers. TI has some drivers in an eval kit but I believe the distribution of those are controlled.

    • Thanks for the comment. I’ve started to get the sense that TI is more closed lipped about their crypto. I’ve seen questions on the TI boards about needing to talk to a Field Engineer for certain docs. Although, I did just find their Crypto Software Download Page the other day. It looks like they have some kernel drivers.

      I’m curious about your patches for the HWRNG, do you mind posting them somewhere ;) ? The TI Crypto Guide mentions that one should enable the HWRNG character device in the Kernel config, but I have not seen that option in the 3.8.18 series that Robert Nelson is using.

      Also, I do believe it’s using the crypto hardware. We seem to agree that cryptodev pushes the operations to the kernel, but at least for now, I am holding to my belief that it’s using the hardware (you seem to be more kernel aware than I, so I will let you refute my points). My belief is based on the following observations:

      1. Hardware support for the OMAP4 AES and SHA/MD5 engines are enabled in the 3.8.13 kernel:

        --- Cryptographic API
        [*] Hardware crypto devices --->
        --- Hardware crypto devices
        <*> Support for OMAP4 AES hw engine
        <*> Support for OMAP4 SHA/MD5 hw engine

      2. The time recorded in the tests above for AES-128-CBC on 8192 bytes is 1991065.60kB/S. This is in the ballpark of what TI recorded for their tests, which was 1321096.53kB/s.

      3. Computation time decreased with cryptodev. While I agree this could be due to an efficient kernel implementation, the time decrease is drastic and also matches TI’s metrics. Without cryptodev took 15.031s user + 0.041 system, but with: 0.225s user, 5.990s kernel. Real time is about the same for both, I think this is due to how OpenSSL is using timers.

      4. I believe TI’s Omap driver is included and available to the kernel. I have not studied cryptodev’s source, so this is where I’m making the leap of faith, I believe that cryptodev is using said driver.

      So, I agree that there is possibility that this is just shifting execution from user-space to kernel-space. If so, my guess is that it’s because cryptodev is not calling the Omap AES driver. However, based upon the observations above and mainly because my test results are matching what TI has published, I believe that it is.

      Have you looked at cryptodev to see how it uses kernel crypto drivers?

    • I will also agree, and after my long response I realize this may have been your main point, that my title is misleading. :)

  8. To make your packages ‘easy’ I would reccomend using ‘checkinstall’ which will automagically create the .deb files for both openssl and the cryptodev module. Might be even better if we have a place to put the ‘deb’ files!

  9. Did you ever get hwrand working? Would be cool for doing lots of keystuff (like a cert generator appliance?). BTW: if one is using ‘browser’ ssl is the default block size 8k? If I set cipher to only be aes128-cbc?

    • I did not. I sent message to TI, so hopefully they point me in the right direction. I think the block sizes are a bit confusing. AES always uses a block size of 128 bits, so I think the “block sizes” in the OpenSSL test represent how much data is being sent to the test at a time.

  10. Found a way to actually edit the original ‘openssl’ sources that come with debian. Basically

    Assuming you are running in /tmp
    apt-get build-dep openssl
    apt-get source openssl
    cd to the directory openssl opens
    go to the ‘debian’ directory
    edit the ‘rules’ file and edit the confargs line to look like the following
    ——————
    CONFARGS = –prefix=/usr –openssldir=/usr/lib/ssl –libdir=lib/$(DEB_HOST_MULTIARCH) no-idea no-mdc2 no-rc5 zlib enable-tlsext no-ssl2 -DHAVE_CRYPTODEV -DUSE_CRYPTDEV_DIGESTS
    ————————————-
    Finally, run the following to build: (you will need to install the debian build tools to do this)
    debuild -us -uc -b

    This way you have the ‘original’ sources as included and intended for the ‘debian’ distro. Provides you with ‘full’ crypto, and you stay ‘on the rails’ as it were. I will send you the modified ‘approved’ .deb, but if you want to do yourself, here it is.

    • Nice. And thanks for the other links as well. My guess is that the HWRNG just hasn’t be ported to the 3.8 kernel yet…

      I’m working on a patch for OpenSSL’s cryptodev which will let it use more of the AM335x AES modes at the moment.

  11. I can also send you ‘ssl bump’ enabled squid if you want…..what a pain in the A$$. Squid 3.3.9…..compiled against your (and my!) crypto accelerated openssl….!!

    • How’s sslbump working for you? I noticed some issues with Tor and I’m not sure if it was the OpenSSL, cryptodev, or Tor. That and I have to work on a few other projects at the moment.

    • No, but I’m getting breakout boards fabbed that will have a SHA chip on them as well as a HWRNG. I’ll write an user space app to access the device over I2C. It won’t be as fast as the TI one, but it would be a good seed…

      • Cool! I have my test batch of cape-lets on their way. If they go well, I’ll order a limited-run batch and make them available. Probably around mid-december. It’ll just be the SHA capelet at first, but the other features should follow pretty steadily after that one.

        Thanks for the reblog.

        On Mon, Nov 18, 2013 at 9:53 PM, fortune datko

  12. Hello,
    currently I am using RaspberryPI as OpenVPN server and with AES-128-CBC I am hitting CPU limit at ~15mbps (overclocked @900mhz)

    Did anybody try what throughput would BeagleBone with HW crypto module enabled and used by openSSL produce?
    I have 70/70mbps link and I use VPN quite often but I rarely need much speed, although when I do, I need a lot (file transfers)

    I could not find any results on the Internet
    would anybody be willing to test it, please?

    Thank you very much in advance

    • Hey Martin,

      I haven’t done the an good open OpenSSL test yet. I recompiled wget with my cryptodev-openssl, but I didn’t see any noticeable difference. I’ll have to setup a local test server to eliminate the latency across the Internet and give it a shot.

      Maybe this weekend? If I run the tests, I’ll add a blog post.

      • Thank you very much in advance,
        I really look forward to the results and I hope you’ll find the time to test it and post post the results soon.

        Best regards,
        Martin

      • have you had time to test the OpenVPN performance yet?
        I am very interested to see the result and I would be very grateful if you could post the results soon :)
        Thanks in advance

      • Martin,

        See my latest post. Hopefully that does the trick. The bandwidth numbers are a bit low, but I think that is because my client (BBB) is in the U.S. and the server was in Europe. From a CPU point of view, it seems to do pretty well.

    • I did do connection tps measurements, and using 128bit aes sha1, we got 450 tps, without was 200tps. I suspect that the acutal thoruput won’t vary that much, since once symmetric cipher done, not too much work.

  13. Pingback: BeagleBone Black OpenVPN Performance | fortune datko

  14. Reblogged this on Cryptotronix, LLC and commented:

    The BeagleBone Black’s TI AM335x processor has cryptographic modules built-in. My tutorial shows how configure OpenSSL to use the onboard AES accelerator.

  15. Any updates on the cryptocape / HWRNG? I did see some new kernel images out, but I have yet to get them working (I did have a non-bootable 3.10.xx tried the kernel faq). Oh well. I also could skype you got some wonderful package help on making openssl much more friendly.

Comments are closed.