Tag: Programming
SHA-3 Finalists: PHP Speed Comparison
by Bryan on Mar.04, 2011, under Programming, Security
Background
As everyone interested in cryptology knows, NIST has been running a cryptographic hash algorithm competition to determine the successor of SHA-2. The chosen algorithm will be aptly named, SHA-3.
NIST selected five SHA-3 finalists – BLAKE, Grøstl, JH, Keccak, and Skein to advance to the third (and final) round of the competition on December 9, 2010, which ended the second round of the competition.
As I’ve said in other places, the SHA-3 competition is extremely important because it draws in the entire cryptology industry together to beat on the submitted algorithms for three years. You can be pretty confident in any algorithm that advances to the final round. But what the competition ultimately determines is which function is the “Jack of all trades”. For those of us who do large-scale database operations where hashes are part of the works, a high security margin and speed are more important than the number of CPU cycles and bits of memory saved, and how well it can be implemented in embedded systems. So I set out to test the five finalist hashes in a typical web application environment.
Why I created the test
My foray into this test began when I wrote a quick CLI PHP script to download photos from my cell phone. As part of the copy process, I naturally built in a checksum routine to verify that each file was copied correctly. I have been an avid follower of the SHA-3 competition from the beginning, and I had read good things about the Skein function, so I had decided to implement it in the script just for fun.
Right around the same time, NIST published its rationale for selecting the five finalists in the competition. After reading through the rationale, I became really curious to see how each function would stack up against one-another in a PHP environment. So after creating PHP extensions for each of the finalists that didn’t already have one, I modified my download script to do some hash benchmarking, and ran the test.
The Test
After using the script to download the photos off of my cell phone, I decided that the amount of data (about 40 MB) just wasn’t large enough to give me a good benchmark. So I decided to run the script against a whole month of exported JPG photos from my DSLR which ended up being nearly 1 GB of data (154 files @ 4-6 MB each). Since each file is hashed twice, we’re approaching 2GB of data hashed each time the operation runs. Since we are only benchmarking the performance of the hash functions, all of the files were copied once and verified a couple of times before the official timing began.
Here is basic overview of how the script works:
- List the source directory contents recursively, looking for .jpg files
- Iterate through the list
- Get the file creation date
- Build a destination path based on the file creation date and file name
- If the destination file does not exist, or hashes of the files do not match, copy the file
- If hashes of the files still don’t match, report a failure
Click here for the source code.
Since the files have already been copied and verified, as mentioned above, the file copy and the last verify never happen in the speed comparison test. It essentially loops through all of the files, builds hashes of the source and destination files, and verifies that they all match.
For the purposes of this test, I needed to be able to keep track of the exact number of bytes hashed (for verification between runs) and the exact amount of time spent actually hashing data so we wouldn’t have to worry about other operations clouding the results. To that end, I built a class with an internal counter for each. The class also contains an isolated hash wrapper function which only accepts the raw data to be hashed, increments the counters, and passes the data on to hash function configured at the object level.
A new object is created and destroyed for each hash function being tested, per round. The wrapper function increments the counters for the lifetime of the object. The number of bytes hashed is an explicit count of the bytes fed to the wrapper function. The time spent hashing is calculated by getting the microsecond time stamp immediately before the hash function is executed, and once again immediately after. The former is subtracted from the latter, and the internal counter is incremented by the result.
The Results
To establish a baseline, I ran iterations of MD5 and SHA-512. MD5 has been the hash of choice for the past few years where speed was a major concern. Unfortunately, MD5 is now considered to be cryptographically broken but it served its purpose here in determining a reasonable floor for speed. I chose SHA as the second baseline because it is the current standard, and I chose to implement its 512 bit mode because that is what the new algorithms will be using.
I ran the entire script, which performs the the verification of the entire dataset for all seven hash functions (MD5, SHA-512, BLAKE, Grøstl, JH, Keccak, Skein), five times.
This test was performed on a 64-bit Ubuntu 10.10 installation running PHP 5.3.3-1ubuntu9.3 in CLI mode. The CPU is an Intel Core2 Duo T9300 @ 2.50GHz and the machine has 4 GB of memory installed. During the entire duration of the test, the load average of the machine peaked at 1.2, CPU usage peaked at 85%, and memory usage peaked at 25%.
| Function | Round 1 | Round 2 | Round 3 | Round 4 | Round 5 |
|---|---|---|---|---|---|
| MD5 | 5.731552 | 5.729477 | 5.817808 | 5.813912 | 5.740509 |
| SHA-512 | 14.610088 | 14.269413 | 14.222281 | 14.468436 | 14.378429 |
| Skein-512 | 6.952610 | 6.767148 | 6.858372 | 6.877997 | 6.812982 |
| Keccak-512 | 8.023958 | 7.778949 | 7.952572 | 7.887774 | 7.886457 |
| JH-512 | 8.195324 | 7.830080 | 7.916424 | 8.040076 | 7.995283 |
| Grøstl-512 | 8.192576 | 8.121383 | 8.205048 | 8.063461 | 8.326136 |
| BLAKE-512 | 9.894579 | 9.715329 | 9.627831 | 9.588126 | 9.610026 |
Click here for the raw results.
Well, that’s it. I’ll leave the analysis of what the results mean to the reader.
Oh, and if you’re interested in the PHP extensions I wrote, they’re available at: https://github.com/archwisp
Complie Simple C Programs With Vim
by Bryan on Nov.30, 2010, under Programming
When working on simple C programs, I like vim to produce a compiled output-file named the same as the source file with the .c extension removed when I issue the :make command. For instance, if I create a source file named “helloworld.c”, I want the compiled binary to be named “helloworld”.
UPDATE:
I discovered this simpler command which does not have any external dependencies. Add it to your ~/.vim/ftplugin/c.vim file.
set makeprg=gcc\ %\ -g\ -o\ %:r
Here is the old command I came up with which uses Perl for the substring extraction. I’m leaving this here as a reference for how you might do other things.
set makeprg=gcc\ -g\ -o\ $(perl\ -e\ 'print\ substr(%,0,-1)')\ %
Vim and .sql files
by Bryan on Nov.18, 2010, under Programming
Hey, have you opened a .sql file in vim and realized with great irritation that you cannot use the left and right arrow keys to navigate in insert mode? Well, here is how you fix it.
Add the following to your ~/.vimrc or ~/.vim/ftplugin/sql.vim file:
let g:omni_sql_no_default_maps = 1
CVE-2010-3765 (firefox)
by Bryan on Oct.28, 2010, under Programming, Security
This is not good.
Unspecified vulnerability in Mozilla Firefox 3.5.x through 3.5.14 and 3.6.x through 3.6.11, when JavaScript is enabled, allows remote attackers to execute arbitrary code via unknown vectors, as exploited in the wild in October 2010 by the Belmoo malware.
National Vulnerability Database
via CVE-2010-3765.
The recent prevalence of this sort of browser bug is really disconcerting. I didn’t realize the severity of the problem until I recently investigated how one of these exploits work; and let me tell you, it’s not pretty. If the browser is hi-jacked, the web developer has essentially no control over what is happening after the page is loaded. For instance, you could visit your bank’s website and receive the actual web page served over SSL/TLS with a valid certificate. Then, the exploit steps in and over-loads (or creates) the JavaScript form handling function. From that point on, anything you enter in the form is known to the operator of the exploit.
With AJAX functionality becoming the norm, it’s also becoming more difficult to disable JavaScript in the browser and still enjoy your web browsing experience. On top of that, who knows whether JavaScript is actually disabled if the browser has been compromised?
Unfortunately, Mozilla is not the only one to blame here. Microsoft’s Internet Explorer, Google’s Chrome, and Apple’s Safari browsers have all had this type of vulnerability reported within the last few months. The other vulnerabilities may not have been exploitable by a “remote attacker”, but the exploit I mentioned was very successful as a local exploit. How many of you are certain you will be protected by a 0-Day exploit which then hi-jacks your browser? Remember, this is a user-space vulnerability; no root required here.
So this is a revelation for me. I’ve been a web-developer for 10 years and I always considered web applications relatively safe and secure when written and implemented properly. Unfortunately, the old axiom, “Nothing is secure”, remains true. The fact that a web site may be completely free of bugs and vulnerabilities (however unlikely that may seem) is completely negated by the fact that a web browser is required to interact with the application. And due to the nature of the Internet, the browser in use is mostly controlled by the user. I say mostly because web developers have some remedial tricks for detecting browsers but they can all be circumvented if the user has the technical prowess. Also, because they may be using a computer controlled by their employer where they may or may not have permission to use their browser of choice.
So ultimately, the security of the data passing between the user and the web application rests in the hands of the browser developer. We can do everything possible to “secure” the client machine and the web site, but if the browser is flawed, all is for naught.
Happy days.
Software Engineering: Over-Engineering
by Bryan on Sep.18, 2009, under Programming
I’ve been a software developer for 10 years and since I can remember, I’ve heard this term being passed around as if it were the bane of any software project: over-engineering. In the early years, I was a novice in a shop full of seasoned developers, so I absorbed every bit of sage advice like a sponge. This idea that you shouldn’t over-think things because you’ll end up with a system that is far more complex than it needs to be made sense. Unfortunately, it seems as though the original intent of this phrase has been lost to the current generation of programmers. From my experience, it seems that far too many developers have interpreted this phrase to mean “do no engineering at all”. Complex systems are created one line at a time by trial and error. Let me point out right away that this article is by no means an attack on people developing software this way. This is my attempt to reach out and help improve the state of the industry. Hopefully, it will make all of our lives a little easier.
When I re-examine the phrase now, the entire concept of over-engineering seems laughable. The purpose of engineering is to design an optimal solution to a practical problem. How can you make a system worse by thinking more about how it can be improved? I think this oxymoron of a concept has stemmed from two possibilities. This first is that these developers don’t understand the role of engineering and the second is that they don’t understand the project requirements.
I’ve experienced the first condition firsthand. When I was a pup, I thought software engineering was creating software from scratch. I created every bit of it, so I must have been the engineer! Unfortunately, it turns out that those projects simply had no engineer at all. There weren’t even specific requirements of the system before I started writing code. I had a rough idea of what it needed to do, so I created it the best way I knew how: by trial and error. This was a gross misunderstanding of the role engineering on my part. I’ve since learned that software engineering is much like any other type of engineering.
One major qualification required of an engineer is a deep understanding of all of the available materials. If an electrical engineer doesn’t understand the difference between paper and plastic capacitors, their circuit may not hold up under certain conditions. If a mechanical engineer doesn’t understand the tensile strength of iron vs aluminum, they may not make the best decision when deciding which to use. In either case, it’s possible that the person in question could design a solution to the problem; it may even turn out to be a good one. But a true engineer would understand all of the costs, strengths, and weaknesses of all of the available materials. They would be able to make a scientific decision about which should be implemented. This same principle is applied to software engineering. If you don’t understand the costs, strengths, and weaknesses of the available technologies, how can you make an informed decision about which to implement?
Another major qualification of an engineer is to understand the problem in its entirety. If an architect is asked to design a home but does not consider whether it will be built on an island where little electricity is available, in a desert, or in the middle of a metropolitan city, it’s likely that something won’t work out. This is an extremely obvious example, but the point I want to illustrate is that the person who wants the house built may not understand all of the issues that need to be considered. It is the job of the engineer to be able to see these issues in the design stage and raise the appropriate questions when necessary. Otherwise, you may end up with a very expensive sculpture. In the realm of software, a client may ask for a custom CRM package but doesn’t mention that they want it to be usable offline. A software engineer needs to be able to foresee this type of issue and raise questions. I’ve seen too many systems built where nobody stepped back to look at the big picture. There would be an immense focus on getting a small task done and when it was finished, it turned out that it couldn’t be implemented as needed or someone on the next team had already solved the problem.
The final qualification I’ll highlight is the ability to create documentation. You could engineer the perfect system but if it’s not spelled out specifically, details get lost. By the time a project enters the development stage, all of the questions need to be answered in writing. Nobody would expect a contractor to build a structure without a blueprint. Neither would you build a complex circuit without a schematic. The same should go for software. Every detail should be spelled out on paper before anyone even starts dreaming of code. There will be plenty of revisions, for sure. But the problems found at this stage will take far less time to correct than in the development stage.
The qualifications I’ve mentioned are things that we need to start seeing more of in software development. It’s a cop-out to say, “It’s my code, nobody else will ever touch it.” or, “I don’t have time to create documentation.”. I’ve inherited too many extremely complex projects left behind by people who said these very things. I’ve even been guilty of doing those things in the past. It’s time to evolve. To me, the fact that the National Vulnerability Database posts 20+ new vulnerabilities a day illustrates that we’re doing something wrong.
Lets make a change.
My hope for this site is that my thoughts and experiences find their way to those who can make use of them. I've been a professional software developer and amateur photographer since 1999 and an amateur musician for nearly 20 years. I have done a fair-share of paying gigs and teaching, but nowadays, I spend most of my music-related time in my home studio. Most of my writing will fall into the main site categories, but I also enjoy reading, running, and auto-racing, so look out for posts on each of those from time to time. Enjoy.