Sunday, March 29, 2015

The Free, Easy, and Secure Way to Encrypt Shared Dropbox Folders on Your Mac

My wife and I have been searching for months for a service to allow us to store our sensitive files in Dropbox securely. There are a lot of services out there that promise to do this well, but frankly, it’s hard to trust that our data would be safe if they happened to be hacked or compromised. I therefore went looking for a piece of software that was:

  1. Vetted as being secure.
  2. Ubiquitous. Something I wouldn’t need to worry about installing or finding on a future machine.
  3. Compatible with Mac OS X.

TrueCrypt, an open-source file encryption product, seemed like it might fit the bill. However, I must have had my head in the sand about a year ago (sure enough, I was off the grid when it happened), but TrueCrypt was abandoned by its developers with a very vague / scary message on their website:

WARNING: Using TrueCrypt is not secure as it may contain unfixed security issues

Ok then. So that in itself wasn’t reason enough for me to give up. I went ahead and downloaded TrueCrypt from some random site and verified the checksum (remind me which of my family is going to know how to do this?) and installed it (in order to do this, I had to temporarily disable OS X’s protection against unsigned software in “Security & Privacy” in System Preferences).

That’s a lot of steps for someone who just wants to share encrypted files. And more steps are bad—I didn’t want people I’m sharing stuff with to just give up. We’d be back to square one.

Not to mention, TrueCrypt doesn’t offer dynamically sized volumes. You have to specify the size of the volume at the time of creation. This means that if the number of files in a volume you’re sharing hits the limit—too bad. You need to now create a new volume (one with a larger limit) and move all of the existing files from the previous volume into it. That’s a drag. And sort of dumb.

Off I went looking for an alternative. “Wait”, I thought, “doesn’t OS X provide full-disk encryption via FileVault?”. Well, I didn’t actually ask myself that, but I concluded that there must be a way to do this just using a stock Mac OS X installation.

Sure enough, after some digging, I found this on Apple’s support site: How to create a password-protected (encrypted) disk image. Boom! There we have it.

If you follow the instructions (which are quite simple, really), you’ll have a “file” (which is actually a “sparse image”—sort of like a Volume) which you can drop into a shared Dropbox folder and share with anyone you want. To add or remove files to the encrypted folder, you need to double click it, type in the password, and then open it like you would any other volume (like your Hard Drive or a USB stick). Once you’re done modifying files, just “eject” it. Easy peasy. Dropbox syncs it immediately.

Oh—and the folder resizes automatically! Another huge plus over TrueCrypt.

So, in short, you can create an AES-encrypted folder, inside your Dropbox, and anyone else who uses a Mac will be able to add or remove files from it without needing to install any third-party software. And, it’s free.

Go forth and encrypt!

P.S. Windows users might have similar luck with BitLocker.

P.P.S. Jamie Phelps pointed me to this AgileBits article regarding a downside of using sparse bundles and Dropbox (note–a sparse image and a sparse bundle are not the same thing, so the two may not have the same downsides). Knox stores its vaults with OS X sparse bundles, and those have been shown to have issues syncing over Dropbox when simultaneous edits are made. I’ve yet to see this be a problem with using sparse images (since OS X—and Dropbox—treats a sparse image as a single file), but it might be an issue. Jamie recommends SafeMonk as an alternative.

Monday, September 8, 2014

Supercharge Your Python Shell

I’ve been using and working with Python in a professional context for around 8 years. So it came sort of a shock that something so useful, so obvious, was unfamiliar to me until around a week and a half ago.

It involves a little file called .pythonrc that lives in your home directory. To get it to work, you’ll need to define an environment variable called PYTHONSTARTUP, like so:

export PYTHONSTARTUP="$HOME/.pythonrc"

(Major hat tip to kasnalin on Reddit for pointing out that I’d forgotten to include this important piece of info in an earlier version of the post.)

If you’re a Python developer, no doubt you open up a Python shell countless times per day. Whether you use the regular Python shell, or another one such as IPython, it’s all the same. Open up the shell, import a few useful modules, and start playing around.

Maybe you find yourself importing the same modules over and over again. Maybe you leave a single shell open all the time since it’s such a pain to re-type those commands.

I spend a lot of time working with Django, so I’d gotten pretty used to typing ./ shell and then from app import models to start playing around with objects in my database.

Well, I got to wondering what this .pythonrc could do to help me out with this, so I dumped in the following code:

    from app import models
    from django.conf import settings
    print("\nCould not import Django modules.")
    print("\nImported Django modules.")

Not expecting this to work at all, I ran my trusty Django shell, and voilà:

$ ./ shell

Imported Django modules.
Python 2.7.6 (default, Jan  6 2014, 13:22:56)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

It’s like magic. No more reverse searching my shell history to import commonly-used modules. If I find myself using one often enough, I just add it into my .pythonrc and it’s there for me every time.

I play around Redis a bunch, so here’s another chunk of code I plopped in there.

from redis import StrictRedis as Redis

r = Redis()

Now, I can just open up the shell and use r to play with Redis.

Here’s another one I’m pretty fond of. I’m a big fan of tab-completion and readline support. So I put this in my .pythonrc too:

    import readline
except ImportError:
    print("Module readline not available.")
    import rlcompleter
    if 'libedit' in readline.__doc__:
        readline.parse_and_bind("bind ^I rl_complete")
        readline.parse_and_bind("tab: complete")

Works like a charm.

Below is my full .pythonrc, in all of its glory (also available as a Gist).

# vim: set ft=python :

from __future__ import print_function

import json
import sys
import datetime

from redis import StrictRedis as Redis

r = Redis()

    import readline
except ImportError:
    print("Module readline not available.")
    import rlcompleter
    if 'libedit' in readline.__doc__:
        readline.parse_and_bind("bind ^I rl_complete")
        readline.parse_and_bind("tab: complete")

    from app import models
    from django.conf import settings
    print("\nCould not import Django modules.")
    print("\nImported Django modules.")

    from dateutil.parser import parse as parse_date
except ImportError:
    print("\nCould not import dateutil.")

Monday, September 1, 2014

Getting Started with Django and PostgreSQL Full-Text Search

After hearing a few months back that PostgreSQL had built-in full-text search support, I had been continually searching for reasons to take advantage of it. A few months went by, and when one of our clients at Lionheart started running into issues with slowness using the Django admin’s built-in search support, I knew I had a solid candidate. After all was said and done, searches returned results approximately 20x faster than vanilla Django. Here’s how you can do it too.

The Basics

One of the great things about PostgreSQL full-text search is that there isn’t too much to learn if you’re already pretty familiar with search engines and how they work. The main idea is that you generate an index using a document containing terms you’d like to search on. PostgreSQL provides a lot of flexibility with regards to how you choose to do this, but I opted to store the documents as a field called fts_document on the table we wanted to search on.

PostgreSQL stores search data in a data type called a tsvector. Here’s how the PostgreSQL documentation describes it:

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word. Sorting and duplicate-elimination are done automatically during input.

Here’s an example of a sentence converted to a tsvector:

=# SELECT to_tsvector('simple', 'The quick brown fox jumps over the lazy dog.');
 'brown':3 'dog':9 'fox':4 'jumps':5 'lazy':8 'over':6 'quick':2 'the':1,7
(1 row)

When you want to search for something, you create what’s called a tsquery.

A tsquery value stores lexemes that are to be searched for, and combines them honoring the Boolean operators & (AND), | (OR), and ! (NOT). Parentheses can be used to enforce grouping of the operators

Here’s an example:

dan=# SELECT to_tsquery('simple', 'fox | jumps');
 'fox' | 'jumps'
(1 row)

When you want to test for a match, you use the @@ operator.

dan=# SELECT to_tsvector('simple', 'The quick brown fox jumps over the lazy dog.') @@ to_tsquery('simple', 'fox');

It’s a match!

The @@ operator also provides a shorthand whereby you don’t need to use the to_tsvector / to_tsquery functions explicitly. E.g.,

dan=# SELECT 'The quick brown fox jumps over the lazy dog.' @@ 'fox';

Sweet! However, you might be confused by the following:

dan=# SELECT 'The quick brown fox jumps over the lazy dog.' @@ 'fox | random';

Huh? We’re searching for either the term “fox” or “random”. This doesn’t seem right…

Well, it turns out that when you specify plain text as an argument to the @@ operator, it wraps it in plainto_tsquery, which functions a little bit differently than the to_tsquery function that we’ve been playing around with above. Notably, it ignores every logical operator except &. So |’s, parentheses, etc., will all be ignored. Personally, I don’t find it to be all that useful.

Also, as you might have noticed above, I provided an additional argument to both the to_tsvector and to_tsquery functions called ‘simple’. This represents the simple text search configuration. Other text search configurations are language-based and strip out stopwards and other lexemes based on what language you might be indexing. Here’s a few other options that I have available on my machine.

dan=# \dF
               List of text search configurations
   Schema   |    Name    |              Description
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
(16 rows)

Play around with a few and see which one suits you best.


By now you know how to create search vectors and queries, search on some data, and specify a text search configuration. Let’s get started on connecting these tools with your Django project.

Since tsvector and tsquery are both standard PostgreSQL data types, we can use them just as we might any other data type like int or varchar in a column definition. This is very handy. Otherwise, we would have to regenerate vectors on-the-fly, which might offset any speedup we get from using FTS in the first place. :)

For the sake of moving forward, let’s say we have a model called Employee in our Django project. We’re trying to speed up searches on anything related to an employee. Assume we’re storing their first name, last name, and full address.

1 class Employee(db.Model):
2     first_name = models.CharField(max_length=64)
3     last_name = models.CharField(max_length=64)
4     address_1 = models.TextField()
5     address_2 = models.TextField()
6     city = models.CharField(max_length=64)
7     state = models.CharField(max_length=2)
8     zip = models.CharField(max_length=9)

We’re now going to create an tsvector based on these fields. To do this, we’ll concatenate all of the fields into a list and then flatten the list into a string to feed into the to_tsvector function.

dan=# SELECT to_tsvector('simple', concat_ws(' ', first_name, last_name, address_1, address_2, city, state, zip)) FROM employee;
 '123':3 '12345':8 'in':7 'main':4 'pawnee':6 'ron':1 'st':5 'swanson':2
(1 row)

If we wanted to stop here, we could just do FTS matches on the fly, but this would be quite slow.

dan=# SELECT COUNT(*) FROM employee WHERE to_tsvector('simple', concat_ws(' ', first_name, last_name, address_1, address_2, city, state, zip)) @@ 'swanson';
(1 row)

The next step is to store this tsvector in a field and keep it updated when the data changes. First step seems easy enough:

dan=# ALTER TABLE employee ADD COLUMN fts_document tsvector;

Hmm, but how to we keep this updated? There are a bunch of options, but according to the PostgreSQL documention, the best one is to add a trigger on document updates and additions. To make things easy, we’re going to create a function to generate the fts document. E.g.,

1 CREATE FUNCTION employee_fts_document(e employee) RETURNS tsvector AS $$
3     employee_document TEXT;
5     SELECT concat_ws(' ', e.first_name, e.last_name, e.address_1, e.address_2,, e.state, INTO employee_document;
6     RETURN to_tsvector('pg_catalog.simple', employee_document);
7 END;
8 $$ LANGUAGE plpgsql;

We can then reference this function in the trigger.

1 CREATE FUNCTION employee_fts_document_trigger() RETURNS TRIGGER AS $$
3     NEW.fts_document=employee_fts_document(NEW);
5 END;
6 $$ LANGUAGE plpgsql;

Finally, we tell PostgreSQL to execute the trigger on employee updates and inserts.

CREATE TRIGGER employee_fts_update_trigger BEFORE UPDATE ON employee FOR EACH ROW EXECUTE PROCEDURE employee_fts_document_trigger();
CREATE TRIGGER employee_fts_insert_trigger BEFORE INSERT ON employee FOR EACH ROW EXECUTE PROCEDURE employee_fts_document_trigger();

We’re almost done. The last thing we might want to do is to tell PostgreSQL that we’d like to index the fts_document column which contains our full-text search vectors. We’re going to add a GIN (Generalized Inverted Index)-based index to our column. You can read about the GIN-based and GiST-based indices in the PostgreSQL documentation. Here’s how we add our index.

CREATE INDEX employee_fts_index ON employee USING gin(fts_document);

That’s it. Our employee table now has an always up-to-date search index that we can use. Before we continue, let’s instantiate the search vectors. We’ll then test everything out to make sure things are working a-ok.

dan=# UPDATE employee SET fts_document=employee_fts_document(id);
dan=# SELECT COUNT(*) FROM employee WHERE fts_document @@ 'ron';
(1 row)



Now that we understand the internals, wouldn’t it be nice if we could keep our databases up-to-date without having to manually copy and paste commands on our PostgreSQL shell? South to the rescue!

We’ll need to generate a blank migration.

$ ./ schemamigration app create_employee_fts
Created You must now edit this migration and add the code for each direction.

Next, open up the migration, and copy in the SQL above in the forwards and backwards methods, as illustrated below.

 1 # ... imports, etc ...
 3 from django.db import connection
 5 class Migration(SchemaMigration):
 7     def forwards(self, orm):
 8         sql = """
 9 CREATE FUNCTION employee_fts_document(integer) RETURNS tsvector AS $$
11     employee_document TEXT;
13     SELECT concat_ws(' ', first_name, last_name, address_1, address_2, city, state, zip) INTO employee_document FROM employee WHERE id=$1;
14     RETURN to_tsvector('pg_catalog.simple', employee_document);
15 END;
16 $$ LANGUAGE plpgsql;
18 CREATE FUNCTION employee_fts_document_trigger() RETURNS TRIGGER AS $$
20     NEW.fts_document=employee_fts_document(;
21     RETURN NEW;
22 END;
23 $$ LANGUAGE plpgsql;
24 """
26         cursor = connection.cursor()
27         cursor.execute(sql);
28         cursor.execute("ALTER TABLE employee ADD COLUMN fts_document tsvector")
29         cursor.execute("UPDATE employee SET fts_document=employee_fts_document(id)");
30         cursor.execute("CREATE TRIGGER employee_fts_update_trigger BEFORE UPDATE ON employee FOR EACH ROW EXECUTE PROCEDURE employee_fts_document_trigger()")
31         cursor.execute("CREATE TRIGGER employee_fts_insert_trigger BEFORE INSERT ON employee FOR EACH ROW EXECUTE PROCEDURE employee_fts_document_trigger()")
32         cursor.execute("CREATE INDEX employee_fts_index ON employee USING gin(fts_document)")
34     def backwards(self, orm):
35         cursor = connection.cursor()
36         cursor.execute("DROP INDEX employee_fts_index")
37         cursor.execute("ALTER TABLE employee DROP COLUMN fts_document")
38         cursor.execute("DROP TRIGGER employee_fts_update_trigger ON employee")
39         cursor.execute("DROP TRIGGER employee_fts_insert_trigger ON employee")
40         cursor.execute("DROP FUNCTION employee_fts_document (integer)")
41         cursor.execute("DROP FUNCTION employee_fts_document_trigger ()")
43     models = {
44         """
45         App-specific model definitions here...
46         """
47     }

Now, just run this migration…

$ ./ migrate app

…and watch as your FTS column, indices, and triggers are automagically created.

The Django Admin

OK, so we have our FTS document created, have generated a migration to keep things in sync across machines, and are indexing everything we might possibly want to search on to find an employee. The next step is to use what we have and integrate it into the Django Admin. This assumes that you have an EmployeeAdmin object similar to the below:

class EmployeeAdmin(admin.ModelAdmin):
    search_fields = ("first_name", "last_name", "address_1", "address_2")
    # ... other admin configuration here ..., EmployeeAdmin)

Our goal is to make searching with the Django Admin use our shiny new FTS index. To do this, we’re going to need to override the get_search_results method in EmployeeAdmin to something like this:

 1 class EmployeeAdmin(admin.ModelAdmin):
 2     # ...
 4     def get_search_results(self, request, queryset, search_term):
 5         if search_term == '':
 6             queryset, distinct = super(EmployeeAdmin, self).get_search_results(request, queryset, search_term)
 7         else:
 8             queryset, distinct = super(EmployeeAdmin, self) \
 9                     .get_search_results(request, queryset, None)
10             queryset = queryset.extra(
11                 where=["employee.fts_document @@ to_tsquery('simple', %s)"],
12                 params=[search_term]
13             )
15         return queryset, distinct

Looks close, but we’re missing something. to_tsquery will expect a “properly-formed” query, and we can’t always trust or depend on users of the admin to know PostgreSQL search syntax (don’t blame them). We need to somehow sanitize the input before feeding it into PostgreSQL.

Here are some examples of things that we might want to handle:

  • Strip padded whitespace.
  • Remove duplicated whitespace.
  • Add &’s between words that are only separated by spaces.
  • Replace double quotes (which PostgreSQL doesn’t like) to single quotes.
  • Clear up situations where search operators are used ambiguously (i.e., “ron & | swanson”)
  • Add a prefix wildcard to every search term, so that a search for “swan” will return “Ron Swanson” in the search results.

Luckily for you, I’ve already done all of the heavy lifting. Below is a function called sanitize_search_term which will take a string and form it into a proper PostgreSQL search query.

 1 import re
 2 import string
 4 def sanitize_search_term(term):
 5     # Replace all puncuation with spaces.
 6     allowed_punctuation = set(['&', '|', '"', "'"])
 7     all_punctuation = set(string.punctuation)
 8     punctuation = "".join(all_punctuation - allowed_punctuation)
 9     term = re.sub(r"[{}]+".format(re.escape(punctuation)), " ", \
10             term)
12     # Substitute all double quotes to single quotes.
13     term = term.replace('"', "'")
14     term = re.sub(r"[']+", "'", term)
16     # Create regex to find strings within quotes.
17     quoted_strings_re = re.compile(r"('[^']*')")
18     space_between_words_re = re.compile(r'([^ &|])[ ]+([^ &|])')
19     spaces_surrounding_letter_re = re.compile(r'[ ]+([^ &|])[ ]+')
20     multiple_operator_re = re.compile(r"[ &]+(&|\|)[ &]+")
22     tokens = quoted_strings_re.split(term)
23     processed_tokens = []
24     for token in tokens:
25         # Remove all surrounding whitespace.
26         token = token.strip()
28         if token in ['', "'"]:
29             continue
31         if token[0] != "'":
32             # Surround single letters with &'s
33             token = spaces_surrounding_letter_re.sub(r' & \1 & ', token)
35             # Specify '&' between words that have neither | or & specified.
36             token = space_between_words_re.sub(r'\1 & \2', token)
38             # Add a prefix wildcard to every search term.
39             token = re.sub(r'([^ &|]+)', r'\1:*', token)
41         processed_tokens.append(token)
43     term = " & ".join(processed_tokens)
45     # Replace ampersands or pipes surrounded by ampersands.
46     term = multiple_operator_re.sub(r" \1 ", term)
48     # Escape single quotes
49     return term.replace("'", "''")

Combining everything together, we get the following:

 1 class EmployeeAdmin(admin.ModelAdmin):
 2     # ...
 4     def get_search_results(self, request, queryset, search_term):
 5         if search_term == '':
 6             queryset, distinct = super(EmployeeAdmin, self).get_search_results(request, queryset, search_term)
 7         else:
 8             search_term = sanitize_search_term(search_term)
 9             queryset, distinct = super(EmployeeAdmin, self) \
10                     .get_search_results(request, queryset, None)
11             queryset = queryset.extra(
12                 where=["employee.fts_document @@ to_tsquery('simple', %s)"],
13                 params=[search_term]
14             )
16         return queryset, distinct

And…we’re done! Searches in the admin will now use PostgreSQL FTS.

Here’s to speedy searching!

EDIT 12/15/2014: In a previous version of this post, employee_fts_document would pull stale values from the DB while generating the FTS document and performed an extraneous SELECT. Thanks, Roger!

Friday, November 8, 2013

Review: iPhone 5s Case

I normally don’t write reviews, but I figured it’s never a bad time to start.

I’ll jump right into it.

I have never before gotten a case for my iPhone. Cases add bulk, take away from the aesthetic of the phone, and are just another “thing” to keep track of. But when I heard about the leather case that Apple created specifically for the iPhone 5s, I thought that it would be the end of my no-case journey.

It came a couple of days ago in the mail. I ordered from Best Buy online. The cases come in 6 color choices—brown, beige, black, yellow, blue, and red. I went with black, since it was the only one that could really go with space gray. Not to mention, I’d read a bunch of reviews that said the other colors (especially the brighter ones) were prone to picking up dirt quickly.

The case looks and feels great in the hand. The leather is no frills and simple. It’s not the sort of leather you’d see on a cowboy boot, but supposedly it’s real. There is a stamped Apple logo on the back. It looks pretty cool, and gives it more authenticity, I guess.

Putting the case on is quite easy. It really is fitted quite well to the form of the phone. Sort of expected for a case, especially if you’re also the manufacturer of the phone itself, but I suppose it’s still a good thing.

As for bulkiness, it didn’t add much, but it was enough to be noticeable. In terms of weight and form factor, the iPhone 5s with the leather case “felt” like my 4S in the hand. This isn’t necessarily a bad thing (some people might like that), but it’s not totally for me.

The biggest problems I have with the case are more functional and less visual.

1. The leather cover over the power button is a bit too stiff. One should not have to struggle when pressing the power button.
2. The volume buttons are not covered like the power button is, but they’re still quite hard to reach. It’s not impossible by any means, but it’s just hard enough to be annoying.

Given that I wasn’t blown away by the look and feel and the usability problems that it has, I decided to return it. I think I’m still in the market for a case, just not the Apple one.

View the rest of the archives