HDF5! Why? Because! (regarding this, all complains should definitely go to tb2332@columbia.edu). In short, HDF5 is a format developped by NASA to handle 1) large 2) heterogeneous 3) hierarchical datasets. The data can be compressed (10%-15% more that matfiles), and the I/O speed is still impressive. Also note that the core library comes free of charge and wrappers exist in most languages (see code tab).
Is is perfect? No. Does it make more sense that 1M zipped json files or matfiles? Yes. Note that the new matfile format (v7.3) is actually HDF5.
Still not happy? Here is python code or matlab code to transform the data into matfiles (the latter is less tested, non-ASCII strings seem wrongly encoded).
- Login to post comments
