Low latency logging can only be implemented efficiently asynchronous, using solid state drives is irrelevant with regards latency and don't provide any edge. The general advice still applies: eliminate memory allocations, data copies, lock contention and context switching. Write to a ring buffer in shared memory between the thread that logs and the thread that dispatches the message to the kernel, and if you are really skillful you could possible implement it lockless/lockfree.