Interest in an LLVM library?
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.) The scope of this proposal is limited. The proposed library does not compile, link, optimize, save, or load anything, for example, since the LLVM API is suitable for that. Also, it may not support obscure features, even if they seem to be within its purview, since one can always make direct calls to the LLVM API. I have written about 6500 lines of prototype code in the context of another project (a compiler for the language Curry, if anyone is interested). It is not a complete solution, but is by no means trivial, either. I'll discuss that next, but since the discussion is somewhat lengthy, let me first explain what I'm looking for by sending this. I think this EDSL would make a good Boost library. If others agree, then the next step is to review the design. I need second opinions and help from more knowledgeable people to improve the design. There are a few places in particular where I think a redesign is necessary. After that, I'm happy to implement the final library (and would gladly accept help, if anyone is interested). Now, I'll show a few examples and provide links to the prototype code. First, here's how to create a module and put it in scope: module const example_module("example"); scope _ = example_module; The scope object sets the insertion point for types, functions, and global variables. We can create a type as follows: type i32 = int_(32); i32 is a 32-bit integer. It is easy to build more complex types, e.g., i32[2] is an array type, *i32 is a pointer type, and i32(i32,i32) is a function type. An expression like i32(42) produces a constexpr. As you might guess at this point, the prototype makes heavy use of SFINAE. I've gone to a lot of effort to make this work with everything I can think of, particularly multi-dimensional arrays and initializer_lists. So, an expression like (i32[3][2])({{1,2,3},{4,5,6}}) works! (Note: i32[3][2] is the C++ type int32_t[2][3] because that's the only sensible way to interpret successive calls to operator[]). We can create a struct from a sequence of types like this: type foo = struct_("foo", {i32, *i32}); Or, we can leave out the sequence to get an opaque struct. A function prototype can be created as follows: function const f = extern_(i32(i32), "f", {"i"}); This function, f, takes an i32 (named "i") and returns an i32. The function is placed in example_module, since the code appears in its scope. There are two other linkage-specifying functions: static_ and inline_. extern_ and static_ are also used to create global variables (i.e., if the type is not a function). To define the body, we create another scope: { scope _ = f; // code for f. } To understand this, we have to discuss scopes in more depth. There are three nested levels of scope: the current module, the current function, and the current basic block. The code above sets the current function to f and the current basic block to f's entry point. Generally speaking, within a function body we write statements that generate instructions. Those instructions are inserted into the current function at the end of the current basic block. To define the body, we can declare values and operate on them. For example: value i = arg("i"); Here, "arg" is a special function that fetches the named function parameter. We can add instructions to f using operators or special methods. For example: value j = i + 1; return_(j); Branches manipulate the current basic block. Here's an if statement: label true_path, false_path; if_(i, true_path, false_path); { scope _ = true_path; // code for true. } /* else */ { scope _ = false_path; // code for false. } // new basic block here. This generates instructions to test i, take the appropriate branch, and then continue to a new basic block in the right places (which depends on whether the paths terminate with a branch instruction). It also updates the scope so that statements following if_() are inserted into the new basic block. Believe me when I say it is not trivial to do this with the LLVM API. Although it is sometimes necessary to pre-declare labels as shown above, we can usually employ C++11 lambdas to simplify the encoding: if_(i , []{ /* code for true */} , []{ /* code for false */ } ); The prototype code can be found under https://github.com/andyjost/Sprite-3/tree/master/include/sprite/backend. Also, there are many examples under https://github.com/andyjost/Sprite-3/tree/master/examples. Other parts of the Sprite-3 project are unrelated to this proposal. A good example to start with generates the Fibonacci numbers (https://github.com/andyjost/Sprite-3/blob/master/examples/fib.cpp). Here's the interesting part: // Create a new module and associate it with this lexical scope. module const fib_module("fib"); scope _ = fib_module; // Declare types. auto const char_ = types::char_(); auto const i32 = types::int_(32); auto const i64 = types::int_(64); // Declare external functions. auto const printf = extern_(i32(*char_, dots), "printf"); // Define the @fib function. auto const fib = extern_(i64(i64), "fib", {"n"}); { scope _ = fib; value n = arg("n"); if_(n <(signed_) (2), []{return_(1);}); return_(fib(n-2) + fib(n-1)); } // Define the @main function. auto const main = extern_(i32(), "main"); { scope _ = main; value const x = fib(5); printf("fib(5)=%d\n", x); return_(0); } After this code runs, fib_module contains an intermediate representation of the program. The LLVM API can be used to save that to disk, compile it to assembly, link in the C library (to get printf), or JIT compile any of the functions, among other things. Thanks to everyone who made it this far :) I'm looking forward to your feedback. -Andy
Hi Andy, Andy Jost wrote:
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language.
Wouldn't this be better hosted by LLVM themselves? What Boost libraries are you using in your implementation? Regards, Phil.
Phil Endecott wrote:
Wouldn't this be better hosted by LLVM themselves?
I don't think so. The contributions in this case are defining the EDSL and implementing it in C++. Both require C++ expertise. These do not align well with the interests or skills of the LLVM community, which is focused on developing compiler technology and not necessarily populated with C++ experts. The Boost community seems far better positioned to host a technical C++ library such as this.
What Boost libraries are you using in your implementation?
Only Boost.Preprocessor. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Andrey Semashev wrote:
On 2015-11-19 02:23, Andy Jost wrote:
Phil Endecott wrote:
What Boost libraries are you using in your implementation?
Only Boost.Preprocessor.
Did you consider Boost.Proto?
Yes, but expression templates are not the right concept for this library because programs are built dynamically. Consider the task of writing a compiler. A reasonable approach would be to write a Boost.Spirit parser that uses the proposed library to encode the semantic actions, thereby building the program. I don't think Boost.Proto can do that, but I could be missing something obvious. Also, LLVM manages the program representation for us, providing only handles to internal objects. Therefore, I think the AST and transformations provided by Boost.Proto would be redundant at best. Please correct me if I'm wrong. -Andy
Andy Jost wrote on Tuesday, November 17, 2015 12:35 PM
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)
If this can be used to (reasonably) easily make runtime JITed functions callable from the running executable it would be very useful. Erik ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
Nelson, Erik - 2 wrote:
If this can be used to (reasonably) easily make runtime JITed functions callable from the running executable it would be very useful.
Yes, it can! There is a testing module defined in the following files that demonstrates how to JIT functions. https://github.com/andyjost/Sprite-3/blob/master/include/sprite/backend/supp... https://github.com/andyjost/Sprite-3/blob/master/src/testing.cpp Some tests that use this module can be found in https://github.com/andyjost/Sprite-3/blob/master/examples/branches.cpp. There is also an example at https://github.com/andyjost/Sprite-3/blob/master/examples/sandbox.cpp.
On 11/18/2015 06:45 PM, Nelson, Erik - 2 wrote:
Andy Jost wrote on Tuesday, November 17, 2015 12:35 PM
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.) If this can be used to (reasonably) easily make runtime JITed functions callable from the running executable it would be very useful.
I wholeheartedly agree with this. I see the example you linked showing how to take a program built with the proposed library and JIT it for use in the running application. That's pretty nice as-is, but it would be even better if the JIT API was abstracted further, streamlining the process of creating a function, compiling it, then running it later in the application. While that is obviously possible today using LLVM's API, I feel like a simplified interface, similar to what you've already done, would go a long way to make it more accessible for users. Jason
Jason Roehm wrote:
a simplified interface, similar to what you've already done, would go a long way to make it more accessible for users.
OK, I'm convinced. LLVM is oriented towards compiler writers, who are expected to go deep into the details. This library is aimed at those who would use LLVM in relatively simple ways and who are not willing to learn the gory details. I agree that providing a complete interface for compiling, optimizing, and linking in the most common ways would be necessary for a complete library. -Andy
On Tue, Nov 17, 2015 at 8:35 PM, Andy Jost
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)
The scope of this proposal is limited. The proposed library does not compile, link, optimize, save, or load anything, for example, since the LLVM API is suitable for that. Also, it may not support obscure features, even if they seem to be within its purview, since one can always make direct calls to the LLVM API.
I'm not sure I understand the purpose of this library. It seems to build an AST but then it doesn't seem offer to do anything with it? Is it just a wrapper for LLVM AST API?
Andrey Semashev wrote:
I'm not sure I understand the purpose of this library. It seems to build an AST
It builds an intermediate representation of the program. LLVM is a top-of-the-line compiler infrastructure. It does many things incredibly well but, unfortunately, does not provide an expressive API for defining programs. The API it provides is far too low-level. Let me demonstrate. Here is a five-line "hello world" program written in my EDSL: auto const puts = extern_<Function>(i32(*char_), "puts"); auto const main = extern_<Function>(i32(), "main", [&] { puts("hello world\n"); return_(0); } The product of this is an intermediate representation of the program. Regarding its use, the LLVM website says the following (http://llvm.org/docs/LangRef.html#introduction): "The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation... The three different forms of LLVM are all equivalent." The EDSL produces the in-memory IR form. Here is the human-readable equivalent for this program: @.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1 declare i32 @puts(i8*) define i32 @main() { .entry: %0 = call i32 @puts(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0)) ret i32 0 } To build up this representation from LLVM C++ API calls is extremely cumbersome. A file that does exactly that (produced by the LLVM tool llc with the option -march=cpp) is 127 lines and 3761 bytes long. Here's how it declares the puts function using API calls: Function* func_puts = mod->getFunction("puts"); if (!func_puts) { func_puts = Function::Create( /*Type=*/FuncTy_2, /*Linkage=*/GlobalValue::ExternalLinkage, /*Name=*/"puts", mod); // (external, no body) func_puts->setCallingConv(CallingConv::C); } AttributeSet func_puts_PAL; func_puts->setAttributes(func_puts_PAL); I hope you'll agree that this style of API is not particularly expressive, simple, or fun to use. It is too low level. The proposed library aims to simplify encoding programs dynamically by raising the level of abstraction.
but then it doesn't seem offer to do anything with it?
With the intermediate representation in hand, doing things with it through the LLVM API is comparatively straightforward. For example, to JIT compile and invoke the main function above (assuming it resides in module m), we do the following:
ExecutionEngine * jit = EngineBuilder(m.ptr())
.setEngineKind(EngineKind::JIT)
.create();
void * fp = jit->getPointerToFunction(m->getFunction("main"));
auto main = reinterpret_cast
Is it just a wrapper for LLVM AST API?
Essentially, yes. Though it's not really an AST, as discussed above. Like Boost.Python, this library makes the native API of some popular external library much easier (and more fun!) to use. -Andy
On 2015-11-19 04:41, Andy Jost wrote:
Andrey Semashev wrote:
I'm not sure I understand the purpose of this library. It seems to build an AST
It builds an intermediate representation of the program.
LLVM is a top-of-the-line compiler infrastructure. It does many things incredibly well but, unfortunately, does not provide an expressive API for defining programs. The API it provides is far too low-level.
Is it just a wrapper for LLVM AST API?
Essentially, yes. Though it's not really an AST, as discussed above. Like Boost.Python, this library makes the native API of some popular external library much easier (and more fun!) to use.
Thanks for the clarification. I wouldn't say Boost.Python is a fair comparison though as it is a binding to another programming language. This includes integrating C++ into Python as well as the other way around. I think your proposed library is rather incomplete. I mean, tools for building the intermediate bitcode are useful, but there has to be a way to use that bitcode somehow in terms of the library. You refer to LLVM API for that, but that immediately breaks the abstraction you've built with your library. Another question I had is this. Does your library only offer generation of the intermediate bitcode or also something else? Does it offer tools for traversal, analysis and transformation of the bitcode? Does it support reading the (high-level) code from some sort of input? For instance, can I build a static analyzer tool for C/C++ with your library?
Andrey Semashev wrote:
I wouldn't say Boost.Python is a fair comparison though as it is a binding to another programming language.
The LLVM code representation is a programming language. The "LLVM Language Reference Manual" can be found at http://www.llvm.org/docs/LangRef.html. But you're right, the comparison is not perfect.
This includes integrating C++ into Python
The comparison is strong in this direction. Given a boost::python::object p, p[i] generates an Python API call to index p at i. Given a proposed boost::llvm::value object v, v[i] generates an LLVM API call to index v at i.
as well as the other way around.
It's true, the original (more limited) proposed includes nothing for this direction. If, however, the proposal is expanded along the lines several have suggested, then some features would be added. For example, the library could load LLVM code from disk, link it against dynamic code, optimize the combination (mainly to inline functions), JIT-compile it, and run the result in the host program. In fact, this usage pattern might be commonplace, since most languages come with a runtime library. (The Curry compiler I wrote does this, except it writes an executable to disk rather than JITing code).
I think your proposed library is rather incomplete. I mean, tools for building the intermediate bitcode are useful, but there has to be a way to use that bitcode somehow in terms of the library. You refer to LLVM API for that, but that immediately breaks the abstraction you've built with your library.
Yes, I agree. The library must target people who want to dynamically compile (and use!) code without learning the gory details of LLVM.
Another question I had is this. Does your library only offer generation of the intermediate bitcode or also something else?
No. I don't think anything else is needed. The bitcode can easily be converted to assembly.
Does it offer tools for traversal, analysis and transformation of the bitcode?
These tasks are the main focus of LLVM. I think anyone who wants to do these things should learn LLVM and use its API. The proposed library should focus on (easily) constructing programs dynamically, since that task seems to have been mostly neglected by the LLVM community.
Does it support reading the (high-level) code from some sort of input?
The Clang library does.
For instance, can I build a static analyzer tool for C/C++ with your library?
For that, I think you would use Clang to compile C/C++ into bitcode and then perform the static analysis using the LLVM API. LLVM is beautiful for these things. -Andy
On 11/17/2015 07:35 PM, Andy Jost wrote:
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)
The scope of this proposal is limited. The proposed library does not compile, link, optimize, save, or load anything, for example, since the LLVM API is suitable for that. Also, it may not support obscure features, even if they seem to be within its purview, since one can always make direct calls to the LLVM API.
Perhaps this is beyond the scope as well, but do you plan to allow integration of the JITted code with the calling program? For example, have it call existing functions, manipulate existing types (defined in C++, not the EDSL), and even inline functions from the calling program (perhaps consulting an intermediate representation of the calling program on disk)? These capabilities would allow JIT compilation of user-provided programs that run on user-provided data sets, for example in big data applications.
I have written about 6500 lines of prototype code in the context of another project (a compiler for the language Curry, if anyone is interested). It is not a complete solution, but is by no means trivial, either. I'll discuss that next, but since the discussion is somewhat lengthy, let me first explain what I'm looking for by sending this. I think this EDSL would make a good Boost library. If others agree, then the next step is to review the design. I need second opinions and help from more knowledgeable people to improve the design. There are a few places in particular where I think a redesign is necessary. After that, I'm happy to implement the final library (and would gladly accept help, if anyone is interested).
Now, I'll show a few examples and provide links to the prototype code. First, here's how to create a module and put it in scope:
module const example_module("example"); scope _ = example_module;
The scope object sets the insertion point for types, functions, and global variables.
We can create a type as follows:
type i32 = int_(32);
i32 is a 32-bit integer. It is easy to build more complex types, e.g., i32[2] is an array type, *i32 is a pointer type, and i32(i32,i32) is a function type. An expression like i32(42) produces a constexpr. As you might guess at this point, the prototype makes heavy use of SFINAE. I've gone to a lot of effort to make this work with everything I can think of, particularly multi-dimensional arrays and initializer_lists. So, an expression like (i32[3][2])({{1,2,3},{4,5,6}}) works! (Note: i32[3][2] is the C++ type int32_t[2][3] because that's the only sensible way to interpret successive calls to operator[]).
We can create a struct from a sequence of types like this:
type foo = struct_("foo", {i32, *i32});
Or, we can leave out the sequence to get an opaque struct.
A function prototype can be created as follows:
function const f = extern_(i32(i32), "f", {"i"});
This function, f, takes an i32 (named "i") and returns an i32. The function is placed in example_module, since the code appears in its scope. There are two other linkage-specifying functions: static_ and inline_. extern_ and static_ are also used to create global variables (i.e., if the type is not a function). To define the body, we create another scope:
{ scope _ = f; // code for f. }
To understand this, we have to discuss scopes in more depth. There are three nested levels of scope: the current module, the current function, and the current basic block. The code above sets the current function to f and the current basic block to f's entry point. Generally speaking, within a function body we write statements that generate instructions. Those instructions are inserted into the current function at the end of the current basic block. To define the body, we can declare values and operate on them. For example:
value i = arg("i");
Here, "arg" is a special function that fetches the named function parameter. We can add instructions to f using operators or special methods. For example:
value j = i + 1; return_(j);
Branches manipulate the current basic block. Here's an if statement:
label true_path, false_path; if_(i, true_path, false_path); { scope _ = true_path; // code for true. } /* else */ { scope _ = false_path; // code for false. } // new basic block here.
This generates instructions to test i, take the appropriate branch, and then continue to a new basic block in the right places (which depends on whether the paths terminate with a branch instruction). It also updates the scope so that statements following if_() are inserted into the new basic block. Believe me when I say it is not trivial to do this with the LLVM API. Although it is sometimes necessary to pre-declare labels as shown above, we can usually employ C++11 lambdas to simplify the encoding:
if_(i , []{ /* code for true */} , []{ /* code for false */ } );
The prototype code can be found under https://github.com/andyjost/Sprite-3/tree/master/include/sprite/backend. Also, there are many examples under https://github.com/andyjost/Sprite-3/tree/master/examples. Other parts of the Sprite-3 project are unrelated to this proposal. A good example to start with generates the Fibonacci numbers (https://github.com/andyjost/Sprite-3/blob/master/examples/fib.cpp). Here's the interesting part:
// Create a new module and associate it with this lexical scope. module const fib_module("fib"); scope _ = fib_module;
// Declare types. auto const char_ = types::char_(); auto const i32 = types::int_(32); auto const i64 = types::int_(64);
// Declare external functions. auto const printf = extern_(i32(*char_, dots), "printf");
// Define the @fib function. auto const fib = extern_(i64(i64), "fib", {"n"}); { scope _ = fib; value n = arg("n"); if_(n <(signed_) (2), []{return_(1);}); return_(fib(n-2) + fib(n-1)); }
// Define the @main function. auto const main = extern_(i32(), "main"); { scope _ = main; value const x = fib(5); printf("fib(5)=%d\n", x); return_(0); }
After this code runs, fib_module contains an intermediate representation of the program. The LLVM API can be used to save that to disk, compile it to assembly, link in the C library (to get printf), or JIT compile any of the functions, among other things.
Thanks to everyone who made it this far :) I'm looking forward to your feedback.
-Andy
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Avi Kivity wrote:
do you plan to allow integration of the JITted code with the calling program?
The LLVM JIT compiler will resolve functions from the host program. They must be externally defined in the embedded program.
and even inline functions from the calling program (perhaps consulting an intermediate representation of the calling program on disk)?
That is certainly possible, though perhaps difficult for a Boost library to do. I believe Clang has options for embedding a program's LLVM into its executable. Perhaps the JIT compiler is smart enough to use that if it's available. Anyway, it sounds like a good feature to consider for this library.
These capabilities would allow JIT compilation of user-provided programs that run on user-provided data sets, for example in big data applications.
Yes, it is certainly possible. I have written commercial software that does this (though not with this EDSL), so I can say with confidence that it is practical. Rather than inline functions automatically from the host program, I think it would be easier to build a runtime library to link and inline with user programs. It's easy to compile C/C++ to LLVM. -Andy
On 11/20/2015 07:35 PM, Andy Jost wrote:
Avi Kivity wrote:
do you plan to allow integration of the JITted code with the calling program? The LLVM JIT compiler will resolve functions from the host program. They must be externally defined in the embedded program.
and even inline functions from the calling program (perhaps consulting an intermediate representation of the calling program on disk)? That is certainly possible, though perhaps difficult for a Boost library to do. I believe Clang has options for embedding a program's LLVM into its executable. Perhaps the JIT compiler is smart enough to use that if it's available. Anyway, it sounds like a good feature to consider for this library.
These capabilities would allow JIT compilation of user-provided programs that run on user-provided data sets, for example in big data applications. Yes, it is certainly possible. I have written commercial software that does this (though not with this EDSL), so I can say with confidence that it is practical. Rather than inline functions automatically from the host program, I think it would be easier to build a runtime library to link and inline with user programs. It's easy to compile C/C++ to LLVM.
Then consider me a user. I hope the "runtime library" you refer to is in some intermediate representation, having been partially compiled from C++ when the product is built, and then inlined into the generated code during JIT time. I'd hate to run the full C++ compiler during JIT time.
Avi Kivity wrote:
I hope the "runtime library" you refer to is in some intermediate representation, having been partially compiled from C++ when the product is built, and then inlined into the generated code during JIT time. I'd hate to run the full C++ compiler during JIT time.
Yes, it is exactly this way. It is easy to compile a runtime library written in C or C++ into a library of LLVM code. See https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/C/Makefil... for an example. You may also be interested in linking compiled C++ and LLVM code generated by the EDSL into a single runtime library before the user code is available. See these files: https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/llvm/Make... https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/Makefile -Andy
On 11/22/2015 09:10 PM, Andy Jost wrote:
Avi Kivity wrote:
I hope the "runtime library" you refer to is in some intermediate representation, having been partially compiled from C++ when the product is built, and then inlined into the generated code during JIT time. I'd hate to run the full C++ compiler during JIT time. Yes, it is exactly this way. It is easy to compile a runtime library written in C or C++ into a library of LLVM code. See https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/C/Makefil... for an example.
You may also be interested in linking compiled C++ and LLVM code generated by the EDSL into a single runtime library before the user code is available. See these files:
https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/llvm/Make... https://github.com/andyjost/Sprite-3/blob/master/runtime/sprite-rt/Makefile
Cool, this is exactly what I want.
On 18/11/15 01:35, Andy Jost wrote:
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)
I'll concede I haven't investigated or put much thought into this, but my first thought is is there any overlap with the reflection proposals from SG7, at least from an API point of view? If so, go seek out Chandler, he's probably all over it and more than happy to look at current usage. Ben
Ben Pope wrote:
is there any overlap with the reflection proposals from SG7, at least from an API point of view?
There could be some similarity. For example, the library could provide an enum facility like this: enum_ e("a", 1, "b", "c"); assert(e.size() == 3); assert(e.get("a") == 1); assert(e.get(1) == 2); There's no need for enum_traits or enumerators, since we have the object e. Regarding class_traits, the LLVM code representation is approximately a platform independent assembly language. So, no inheritance, virtual functions, or templates. That's why I chose to create an EDSL for a C-like language. class_members could be relevant, though. -Andy
I could imagine that the library could be useful in the context of of self-modifying rule-engines.
I'd really be interested in that. In boost or anywhere else.
Regards,
On Nov 18, 2015 2:35 AM, "Andy Jost"
Is there interest in an LLVM library? My motivation is to simplify the dynamic creation of functions, modules, and programs by defining an EDSL modeled after the C language. LLVM provides an API based exclusively on function calls that is verbose and difficult to use. The EDSL is more expressive and easier to use. (This may remind some people of Boost.Python and indeed these libraries are similar in motivation.)
The scope of this proposal is limited. The proposed library does not compile, link, optimize, save, or load anything, for example, since the LLVM API is suitable for that. Also, it may not support obscure features, even if they seem to be within its purview, since one can always make direct calls to the LLVM API.
I have written about 6500 lines of prototype code in the context of another project (a compiler for the language Curry, if anyone is interested). It is not a complete solution, but is by no means trivial, either. I'll discuss that next, but since the discussion is somewhat lengthy, let me first explain what I'm looking for by sending this. I think this EDSL would make a good Boost library. If others agree, then the next step is to review the design. I need second opinions and help from more knowledgeable people to improve the design. There are a few places in particular where I think a redesign is necessary. After that, I'm happy to implement the final library (and would gladly accept help, if anyone is interested).
Now, I'll show a few examples and provide links to the prototype code. First, here's how to create a module and put it in scope:
module const example_module("example"); scope _ = example_module;
The scope object sets the insertion point for types, functions, and global variables.
We can create a type as follows:
type i32 = int_(32);
i32 is a 32-bit integer. It is easy to build more complex types, e.g., i32[2] is an array type, *i32 is a pointer type, and i32(i32,i32) is a function type. An expression like i32(42) produces a constexpr. As you might guess at this point, the prototype makes heavy use of SFINAE. I've gone to a lot of effort to make this work with everything I can think of, particularly multi-dimensional arrays and initializer_lists. So, an expression like (i32[3][2])({{1,2,3},{4,5,6}}) works! (Note: i32[3][2] is the C++ type int32_t[2][3] because that's the only sensible way to interpret successive calls to operator[]).
We can create a struct from a sequence of types like this:
type foo = struct_("foo", {i32, *i32});
Or, we can leave out the sequence to get an opaque struct.
A function prototype can be created as follows:
function const f = extern_(i32(i32), "f", {"i"});
This function, f, takes an i32 (named "i") and returns an i32. The function is placed in example_module, since the code appears in its scope. There are two other linkage-specifying functions: static_ and inline_. extern_ and static_ are also used to create global variables (i.e., if the type is not a function). To define the body, we create another scope:
{ scope _ = f; // code for f. }
To understand this, we have to discuss scopes in more depth. There are three nested levels of scope: the current module, the current function, and the current basic block. The code above sets the current function to f and the current basic block to f's entry point. Generally speaking, within a function body we write statements that generate instructions. Those instructions are inserted into the current function at the end of the current basic block. To define the body, we can declare values and operate on them. For example:
value i = arg("i");
Here, "arg" is a special function that fetches the named function parameter. We can add instructions to f using operators or special methods. For example:
value j = i + 1; return_(j);
Branches manipulate the current basic block. Here's an if statement:
label true_path, false_path; if_(i, true_path, false_path); { scope _ = true_path; // code for true. } /* else */ { scope _ = false_path; // code for false. } // new basic block here.
This generates instructions to test i, take the appropriate branch, and then continue to a new basic block in the right places (which depends on whether the paths terminate with a branch instruction). It also updates the scope so that statements following if_() are inserted into the new basic block. Believe me when I say it is not trivial to do this with the LLVM API. Although it is sometimes necessary to pre-declare labels as shown above, we can usually employ C++11 lambdas to simplify the encoding:
if_(i , []{ /* code for true */} , []{ /* code for false */ } );
The prototype code can be found under https://github.com/andyjost/Sprite-3/tree/master/include/sprite/backend. Also, there are many examples under https://github.com/andyjost/Sprite-3/tree/master/examples. Other parts of the Sprite-3 project are unrelated to this proposal. A good example to start with generates the Fibonacci numbers ( https://github.com/andyjost/Sprite-3/blob/master/examples/fib.cpp). Here's the interesting part:
// Create a new module and associate it with this lexical scope. module const fib_module("fib"); scope _ = fib_module;
// Declare types. auto const char_ = types::char_(); auto const i32 = types::int_(32); auto const i64 = types::int_(64);
// Declare external functions. auto const printf = extern_(i32(*char_, dots), "printf");
// Define the @fib function. auto const fib = extern_(i64(i64), "fib", {"n"}); { scope _ = fib; value n = arg("n"); if_(n <(signed_) (2), []{return_(1);}); return_(fib(n-2) + fib(n-1)); }
// Define the @main function. auto const main = extern_(i32(), "main"); { scope _ = main; value const x = fib(5); printf("fib(5)=%d\n", x); return_(0); }
After this code runs, fib_module contains an intermediate representation of the program. The LLVM API can be used to save that to disk, compile it to assembly, link in the C library (to get printf), or JIT compile any of the functions, among other things.
Thanks to everyone who made it this far :) I'm looking forward to your feedback.
-Andy
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (9)
-
Andrey Semashev
-
Andy Jost
-
Avi Kivity
-
Ben Pope
-
Felipe Magno de Almeida
-
Jason Roehm
-
Nelson, Erik - 2
-
Oliver Kowalke
-
Phil Endecott