Over the past few months, the widely-used LAS file format went through a revision process to make some incremental changes to the 1.3 version of the format. Considerable controversy surrounded one aspect of the proposed new 1.4 version, and the lidar community got to watch a very messy process play out on some public mailing lists and forums. One of the questions underlying the 1.4 debate was what role, if any, the lidar community at large should have in shaping the LAS specification. Is LAS an open specification? What does it mean for a specification or standard to be open?[1]
Before you adopt or use a specification, you should pause if only briefly to think about what sort of leverage you have and might need in controlling the future of that specification. Think about these three examples of where your use of a hypothetical point cloud file format specification might lead to problems:
You find that some vendors who also implement the file format are generating files that your application cant read, a problem which you determine to be caused by different interpretations of an ambiguous clause in the specification. Who do you ask for the official clarification of the clause?
You find that your data processing workflow would really benefit from having a surface normal field added to the file format. Do you have a conduit for requesting this new feature be added to a future version?
The owner of the specification you are using has just proposed a significant change to the file format which will cause significant problems for you and your users. Do you have a vote in the matter, or at least an avenue for expressing your concerns?
Specifications are written and maintained by groups of many different types and for many different reasons. The goals of the specification maintainers simply might not align with yours, and so you need to know what factors to consider and what issues to look for when choosing a standard or specification. Lets look at four different point cloud file format specifications and try to identify some of the areas where they differ in their degree of openness.
Four formats
LAS is an uncompressed, binary file format.[2] It is controlled by ASPRS, which is not a standards organization. The ASPRS board appoints a committee chair, who then accepts members of the committee at his discretion. Changes to the specification are controlled by the committee chair. The final draft of a new version undergoes public review/comment period, and the committee reviews comments received. The copyright for the specification document is owned by ASPRS. No reference implementation of the format is provided.
MrSID is a compressed, binary file format which recently added support for lidar data.[3] It is controlled by LizardTech, a for-profit company: all changes to the format are made exclusively by LizardTech. It is considered a de facto standard, meaning that while not controlled by any standards body it is nonetheless widely used in the industry. Implementations are available from LizardTech, but only in closed, binary form. There is no public specification document.
E57 is a (new) uncompressed, binary file format.[4] It is controlled by ASTM, which is a well-known standards body but with relatively little software expertise. The E57 committee chair is elected from the ASTM membership, and committee membership is open to all ASTM members. Changes to the specification are controlled by committee vote. The final draft of a new version undergoes public review/comment period, and the committee reviews comments received. The copyright for the specification document is owned by ASTM. An open source reference implementation has been independently produced by the members of the committee; the software is not part of the ASTM standards process.
XYZ, also known variously as ASCII or ASCII/xyz or just text, is an uncompressed, text format.[5] It is not controlled by anyone: it is simply a convention that has developed over the years as a least common denominator interoperability solution. There is no specification. The format assumes data external to the file to define the column separator and to define which columns represent which fields; various informal conventions exist for specifying this external data. There are many implementations of XYZ readers and writers.
So which one is Open?
Those four different file formats are all in use today and Ive heard all of them described as open at one time or another, and yet their specification processes all operate in very different ways. For technical, political, commercial, or simply historical reasons, the openness of each format varies with respect to committee membership, change approval process, acceptance of public comments, and the existence of reference implementations.
Especially for file formats, using an existing specification is almost always better than inventing your own new one you get interoperability, you get to build on years of prior art and expertise, and sometimes you even get free implementations. But before you adopt an existing specification, a little due diligence is in order. Ask who owns that specification, find out when they really mean when they claim to be open, and consider to what degree if any! that specification owner has your particular interests in mind.
Full disclosure: I was one of the people participating in the LAS 1.4 debate; I am a currently a member of the E57 committee; and I worked at LizardTech on the MrSID file format.
[1] Though often used interchangeably, a specification and a standard are subtly different things. A standard is a set of normative processes, criteria, definitions, etc., and may be treated abstractly. A specification defines a set of concrete formal technical requirements, such as a file format; a specification is a type of standard.
[2] http://www.asprs.org/a/society/committees/standards/lidar_exchange_format.html
[4] http://www.astm.org/COMMIT/SUBCOMMIT/E5704.htm
[5] See http://www.gdal.org/frmt_xyz.html for one specification.