[boost] [MMap/VM] RFC

3 Mar 2016

      Hello everyone,

for quite some time I've been working on a portable mmap/virtual memory 
library and I've finally found enough time to bring it out of the 'my 
internal litle tool' state into something presentable: 
https://github.com/psiha/mmap (C++14 currently).
I would now kindly ask fellow devs for opinions (on the good, the bad, 
the ugly and the future;)
First let me answer the basic questions (i.e. on motivation and scope)...

Q&A:

(1) Why?

Considering Boost already offers two related solutions
* Interprocess: 
http://www.boost.org/doc/libs/release/doc/html/interprocess/sharedmemorybetw...
* Iostreams: 
http://www.boost.org/doc/libs/release/libs/iostreams/doc/classes/mapped_file...
why do we want a new one?

Even if the two mentioned solutions were adequate, the problem domain 
would still merit a separate, dedicated library merely considering its 
complexity (which will become apparent in later points).
(I suspect Ion would agree here considering he actually authored a 
related standardization proposal:)

(2) Why a completely new library instead of a repackaging of existing 
functionality?

I'm not satisfied with (a) the API semantics/design, (b) the API 
'power'/library capabilities (c) and the implementation efficiency/overhead.
a) Insistence on POSIX semantics: that shared memory objects have to be 
persistent (kernel lifetime) and resizable while not everyone wants or 
needs this (https://svn.boost.org/trac/boost/ticket/4827). This has 
implications both:
- in the interface (requiring manual cleanup guards, additional platform 
specific shm types like windows_shared_memory)
- and the overhead of the library (emulating shm with mapped files on 
major platforms that do not offer full POSIX compliance: Windows, OSX, 
iOS and Android)

MMAP solves this with one class template 'named_memory' and policies:
https://github.com/psiha/mmap/blob/master/include/boost/mmap/mappable_object...
so the user can choose:
named_memory<scoped, resizable>, named_memory<persistent, fixed> or any 
other combination and the library chooses the best implementation.
On POSIX, scoped semantics are achieved with a SEM_UNDO-enabled SysV 
semaphore acting as a kernel/system-global reference counter (which is 
then also used for automatic cleanup of shm zombies left by 
killed/crashed processes).
On Windows the user can choose the Win32 backend (file emulation) or the 
NativeNT (not yet finished/commited) backend (with native persistence 
and resizability).

b)
- lack of a meta layer (e.g. so one can ask is_mappable<FILE *> or 
is_mappable<boost::filesystem::path>, is_mappable<HANDLE> etc...)
- lack of related utility functions, for example:
  -- map_read_only_file( path ) which will open the file for reading, 
query its size, create a read only file mapping and map/return a read 
only view of the file
  -- guarded_operation( view, operation, error_handler ) - execute 
operation wrapped in SEH(Windows)/scoped signal handler(POSIX) guards 
that will catch access violations (e.g. when mmaping network files) and 
gracefully execute error_handler (with the faulting address as the 
parameter)
https://github.com/psiha/mmap/blob/master/include/boost/mmap/mapped_view/gua...
- in MMAP the mapped_view object (which should probably be renamed to 
just 'view' considering the enclosing mmap namespace) is just a RAII 
wrapper around an iterator_range therby providing the standard 
begin/end(), front/back(), operator[], etc. interface (as opposed to a 
get_data(), get_size() like interface)
- general (not specific to mapping) virtual memory functionality (to be 
discussed in a separate point, MMAP also completely lacks this currently)
- MMAP offers a comprehensive 'flags' system (e.g. 
https://github.com/psiha/mmap/blob/master/include/boost/mmap/flags/flags.hpp) 
for everything from object-level and system-level access privileges, 
over object parent-child inheritance to system life-time and 
access-pattern optimisation hints. Quite some time was invested in this 
area to produce a normalized interface that works for all objects 
(files, mappings...) while producing (near)zero adjustment codegen. 
Flags are 'packed'/grouped in structs (e.g. struct access_privileges 
with object_access, child_access and systen_access members) with public 
members so that, after flags are created with a factory function (a 
portable API) the flags can be further tweaked for a specific platform 
(e.g. adding some FreeBSD specific mmap hint which isn't covered by the 
portable/documented MMAP API)

c) It is my view (if not a "self evident truth";D) that libraries that 
merely wrap existing low level functionality (such as OS or CRT APIs) 
should allow you to write code that is safe, portable and looks 
reasonably nice while at the same time incurs (near)zero overhead (i.e. 
with a reasonably intelligent compiler, produces codegen that looks 
nearly the same as it would had one used the underlying API directly) - 
and the existing solutions fail that. In the two related tickets I went 
into more detail on this so I'll avoid spaming this post by repeating 
those objections and analysis. Rather I'll present a trivial example 
that demonstrates what wannabeboost::mmap currently produces with a 
decent compiler, https://gist.github.com/psiha/c0823fefc01fa3b39662:

#define BOOST_MMAP_HEADER_ONLY
#include <boost/mmap/mapped_view/mapped_view.hpp>
#include <boost/mmap/mappable_objects/file/utility.hpp>

int main( int /*argc*/, char * /*argv*/[] ) noexcept
{
     auto maybe_foo_view( boost::mmap::map_read_only_file( "foo" )() );
     if ( !maybe_foo_view )
         return static_cast<int>( maybe_foo_view.error() );
     if ( maybe_foo_view->empty() )
         return -1;
     return (*maybe_foo_view)[ 0 ];
}

with Xcode 7.2.1 Clang -O3 build for x64 produces 
https://gist.github.com/psiha/f92a7b8a93c5ce1736ae
(notice how the error handling is also correctly detected as such and 
placed at the end of the function, after the main return...)

Besides not using shared_ptr pimpls or saving paths in std::strings as a 
'nice to have' (YAGNI!:), part of the way this codegen is achieved is 
through the use of (also wannabe) Boost.Err 
(https://github.com/psiha/err) which makes it possible to avoid EH (in 
such simple/'localised' examples). It is also the reason for the 
somewhat awkward (optional<>-like) syntax:
  - map_read_only_file() returns a 
mmap::fallible_result<mmap::mapped_view> (an alias for 
err::fallible_result<mmap::mapped_view, mmap::error>, an "rvalue-only" type)
  - which is converted/'saved'/'pinned' into a result_or_error (the 
maybe_foo_view variable) with the additional operator() call
  - the !maybe_foo_view checks whether the call succeeded or the 
returned object contains an error
  - if error return the error (errno) code
  - else check if the view/file is empty (optional<>-like syntax for 
accessing the contained object through the -> and * operators)
  - else return the value of the first character in the file.
Boost.Err was recently discussed on this list so I'll skip most of that 
now, let me just say that it also supports classic EH coding style (it 
auto adapts, no need for reconfiguration, macros or anything like that), 
i.e. the above code can be rewritten as:

int main( int /*argc*/, char * /*argv*/[] ) noexcept
{
   using namespace boost::mmap;
   try
   {
     read_only_mapped_view const foo_view( map_read_only_file( "foo" ) );
     if ( foo_view.empty() )
         return -1;
     return foo_view[ 0 ];
   }
   catch ( std::runtime_error const & )
   {
      return error::get();
   }
}

...now let me reverse the Q&A direction ;)

3. Library scope: currently the library covers the topic of memory 
mapping (of filesystem objects and virtual memory), however I think that 
'the final' library should cover all resonably portable aspects of 
virtual memory (and be called something like vm with mmap as a nested 
namespace), tackling thingies such as:
- process working set
- portable low memory event handling, madvise, memfd...
- prefetching, locking/unlocking virtual memory to/from physical memory 
(e.g. for realtime sensitive data)
- allocators capable of contiguous resizing (for implementing realloc or 
vector.resize() that does no copying or moving, simply maps new pages at 
the end of the current allocation)
- https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buffer
...

does anyone have an objection/different approach to this?

4. Windows offers/has/uses the concept of an intermediate "mapping" 
object (i.e. you don't create a mapped view of a file, rather you create 
a 'mapping' of a file and then a view of the mapping). This of course 
complicates things but also gives more power, e.g. you can have a r/w 
file open and create a mapping (or several mappings) of it that only 
covers a part of its size and has stricter (e.g read only at 
object/process level) or wider access privileges (e.g. on the system 
level, only the parent process/user can access the file but the mapping 
is accesible by everyone). Boost.MMAP retains this distinction in its 
API (mapping vs mapped_view classes)...comments/thoughts on this please?

5. The above (Windows specific) 'mapping' concept makes it possible to 
create "named file mappings" (so that the object gets a system global 
name, like a shared memory object) which then another process can open 
by its name, erasing the difference between file mappings and vm 
mappings for client processes (kind of like the interface vs hidden 
implementation distinction). MMAP uses this on Windows for 
file-backed/'emulated' shared memory (e.g. that which needs persistence 
and/or resizabilty) - it is created named so that client processes can 
open and access it as 'normal'/native Windows shared memory.
It might actually be possible to make this at least partially portable 
to POSIX systems that use virtual filesystems for implementing shared 
memory (e.g. Linux which uses /dev/shm) where we could symlink the file 
to the shm filesystem directory...(unfortunately I have no way of 
testing this as I have no Linux machine setup, I develop only for 
Windows, OSX, iOS and Android)...does this make any sense/would it be 
worth the hassle (i.e. ease any real world problems)?

6. The security part of the 'flags system' models the POSIX API: for 
named (system level) created objects you specify the permissions for the 
'user', 'group' and 'world'. 'Behind the scenes', on Windows, 'user' 
maps to the user that created the process, 'group' maps to all groups 
that the 'user' belongs to and 'world' maps to the Everyone group..?

When not using the predefined privilege/permission levels [e.g. 
process_default, unrestricted, nix_default (644)] things are currently 
pretty verbose here:
namespace flags = boost::mmap::flags;
using ap        = flags::access_privileges;
auto const default_privileges
(
   ap::system::user ( ap::all  ) |
   ap::system::group( ap::read ) |
   ap::system::world( ap::read )
);

I have a prototype implementation that shortens this with user defined 
literals, so that one can write something like "rwxr-xr--"_perm...All in 
all, this thing looks like it merits a separate library in its own right...

7. Windows has another quirk up its sleeve: global vs local session 
objects - if you want to create a shared memory object visible across 
terminal sessions (this includes a server process running as a service, 
thereby in session 0, and client process created by a logged on user, 
or/also using the NativeNT backend/API to create a native persistent 
shared memory object which requires that it be created in the global 
session '0') you have to prefix its name with "Global\" (and have admin 
privileges)... I'm still pondering how to model this in the API or 
whether it should be automatically handled at all, e.g. whether to 
deduce if the Global\ prefix should be automatically added (based on the 
process privileges, type and access privileges of shm object being 
created etc) or leave it up to the user to know "if I'm running on 
Windows and want to do this and this I have to use the Global\ prefix"...

There's more but the openning post is already too long so I'l stop...for 
now ;)

-- 
"What Huxley teaches is that in the age of advanced technology, 
spiritual devastation is more likely to come from an enemy with a 
smiling face than from one whose countenance exudes suspicion and hate."
Neil Postman

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

[boost] [MMap/VM] RFC

Domagoj Saric