NOTE: Enhanced version is released under the same LGPL licence as the original module. Please do not contact the original author (Dan Pascu) regarding to this version. Send your bug reports to python@cx.hu. Thanx.
Download
How to install on Linux
How to install on Windows
Example (not as complicated as it seems)
Throughput measurement, comparision with simplejson
To be done: Compatibility tests between python-cjson and simplejson.
A guide to port applications from simplesjon to python-cjson.
BUGFIX:
NEW FEATURES:
You can always pass unicode objects to the JSON encoder. If you want to pass str objects containing non-ASCII data please add the encoding='name-of-your-encoding' argument to the cjson.encode() function. Failing to do so may raise EncodeError. The default encoding is latin-1 for compatibility with existing python-cjson releases, but this should be changed to ascii in the future.
The default behaviour is returning str objects for ASCII strings and unicode object for all other strings. This is for compatibility again. existing python-cjson releases. The cjson.decode() call can return both str and unicode objects depending on the contents of the JSON strings by default. This behaviour can cause nasty bugs in your code when you expect str and got unicode or so.
To avoid mixing str and unicode types it's recommended to add all_unicode=True to every cjson.decode() calls, then expect only unicode objects in the output. Alternatvely you can specify an encoding by adding the encoding='name-of-your-encoding' argument to both cjson.encode() and cjson.decode() calls. In this case you will get only str objects with the specified encoding and won't have to handle unicode objects. Decoding a JSON string with characters not found in the specified encoding will raise DecodeError, so do not use fixed encoding for data containing strings with multiple codepages. Tip: Using a fixed encoding helps a lot while prototyping your application and this can be changed to generic unicode later in the release version.
Upgrade only if you need the new features above. This version includes new unit tests for the above feature. All existing and new unit tests are passed with python 2.3.5, 2.4.3 and 2.5.1 without problems. But silent bugs may exists.
NEW FEATURE: Optional automatic conversion of dict keys to string.
Since JSON specification does not allow non-string keys in objects, it's very useful to add optional automatic conversion of dictionary keys. This could be useful when porting code originally written for simplejson that does this by default. The feature can be enabled by passing key2str=True keyword argument to the encode() function. Default behaviour of python-cjson has been preserved, so without this keyword argument encoding of non-string dictionary keys will raise EncodeError.
Upgrade only if you need this new feature. This version includes new unit tests for the above feature. All existing and new unit tests are passed with python 2.3.5, 2.4.3 and 2.5.1 without problems. But silent bugs may exists.
BUGFIX: When a decoder extension function was called after the failure of an internal decoder (for example after failing to interpret new Date(...) as null) the internal exception was propagated (not cleared) and could be incorrectly raised in the decoder extension function pointing to en otherwise correct statement in that function. This could cause severe confusion to the programmer and prevented execution of such extension functions.
You can reproduce this bug with python-cjson-1.0.3x2: Bug #20070401a
Make sure to install the following packages:
binutils gcc libc6-dev linux-kernel-headers python-dev
These are Debian package names. Install the equivalent packages on other distributions.
You can compile and install python-cjson by issuing
python setup.py installas root or with sudo. There should be no errors. The cjson.so library file will be copied into the site-packages directory. The library file may contain debug symbols depending on your default compiler options. You can strip the library to save a bit memory:
strip cjson.soOn Debian it is created to the python's main site-packages directory and not under /usr/local. The library file may be moved to the corresponding site-packages directory under /usr/local. You can test the library by running testjson.py. The build directory can be safely deleted after installation.
Please drop me a mail if you need binary releases for older Python releases (2.3 and 2.4) under Windows. Do not forget to write your exact python version. The minimum required Python version is 2.3.
To compile C extension modules for Python 2.4 or 2.5 with free tools you can use
Giovanni Bajo's GCC 4.1.2 MINGW installer
(I haven't tried it) or an older method using the official MinGW installer (this worked for me):
For Python 2.4 and 2.5 on Windows:
1. Install MinGW from http://sourceforge.net/projects/mingw/
2. Add C:\MinGW\bin to the system PATH (use the System applet from the Control panel)
3. Build your extension with --compiler=mingw32 argument:
python setup.py build --compiler=mingw32or put a distutils.cfg file under C:\python\lib\distutils dir (or where you installed python) containing the following entries:
[build] compiler = mingw32After that you can install extension modules as usual (without the --compiler flag):
python setup.py install
import re import cjson import datetime # Encoding Date objects: def dateEncoder(d): assert isinstance(d, datetime.date) return 'new Date(Date.UTC(%d,%d,%d))'%(d.year, d.month, d.day) json=cjson.encode([1,datetime.date(2007,1,2),2], extension=dateEncoder) assert json=='[1, new Date(Date.UTC(2007,1,2)), 2]' # Decoding Date objects: re_date=re.compile('^new\sDate\(Date\.UTC\(.*?\)\)') def dateDecoder(json,idx): json=json[idx:] m=re_date.match(json) if not m: raise 'cannot parse JSON string as Date object: %s'%json[idx:] args=cjson.decode('[%s]'%json[18:m.end()-2]) dt=datetime.date(*args) return (dt,m.end()) # must return (object, character_count) tuple data=cjson.decode('[1, new Date(Date.UTC(2007,1,2)), 2]', extension=dateDecoder) assert data==[1,datetime.date(2007,1,2),2]
Download example.py
Note the extension keyword arguments.simplejson 1.7.1
Test data: tuples in dicts in a list, 603887 bytes as JSON string Encoder throughput: ~747 kbyte/s Decoder throughput: ~272 kbyte/sTest script modifications required to measure simplejson instead of cjson: simplejson imported, then cjson.encode calls are replaced by simplejson.dumps, cjson.decode calls are replaced by simplejson.loads. NOTE: The simplejson page states, that 1.7.1 contains optional C code to speed up encoding. This is not used in the current test. A comparision including the speedup component will come soon. Stay tuned...
python-cjson 1.0.3x5 - compiler: gcc 3.4.2 mingw-special (MinGW 3.81)
Test data: tuples in dicts in a list, 603886 bytes as JSON string Encoder throughput: ~9199 kbyte/s Decoder throughput: ~9215 kbyte/s
python-cjson 1.0.3x5 - compiler: C compiler from Microsoft Visual C++ Toolkit 2003
Test data: tuples in dicts in a list, 603886 bytes as JSON string Encoder throughput: ~9199 kbyte/s Decoder throughput: ~8776 kbyte/s
It's interesting that the free gcc compiler builds a slightly faster decoder than the MS compiler. The striped pyd (DLL) files are 17k for the MS compiler and 22k for gcc, so the MS compiler uses less memory (and possibly less code cache). Decoder throughput may differ due to some loop unrolling optimization or so, but I did not verified it.
Please remember, that python-cjson requires a C compiler but simplejson uses only the standard library. Simplejson could be a better choice if you want portability, but I recommend python-cjson for performance critical applications, such as servers and frequent data conversion tasks.
NOTE: The results above may not reflect the real-world performance of these packages, but shows a clear difference. Python-cjson is more than 10x faster in encoding and more than 30x in decoding, at least when used against this data. Similar results found when using other realistic data sets. Using extension functions with python-cjson and passing much non-JSON-standard data can affect average performance but does not slow down processing of standard JSON data.
I've done performance testing on a dual Xeon 2.8GHz server with Debian Linux, and got excellent results:
Test data: tuples in dicts in a list, 603886 bytes as JSON string Encoder throughput: ~9545 kbyte/s Decoder throughput: ~16556 kbyte/s