From 0bcdf119688f065364fb9e14f9ad31d15dbcf1f2 Mon Sep 17 00:00:00 2001 From: Sean DeNigris Date: Tue, 19 May 2020 16:30:26 -0400 Subject: [PATCH] Update 1-callouts.pillar Re LibC's `time()`, the text says `(We leave it to the reader to look up the definition of the C function and determine the purpose of this argument.)` While the exact mechanism of the function is tangential, I think it's important and non-obvious why one can and should pass `NULL` as the argument to the spec. Indeed, declarations on some hosts do not specify it and the FFI callout seems to work in Pharo without it. If this is explained later in the booklet, we should mention that here; otherwise a more complete explanation would help --- Chapters/1-callouts.pillar | 71 +++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/Chapters/1-callouts.pillar b/Chapters/1-callouts.pillar index 80a449d..161e12b 100644 --- a/Chapters/1-callouts.pillar +++ b/Chapters/1-callouts.pillar @@ -1,13 +1,13 @@ !! Foreign Function Interface and Call\-Outs -This chapter presents a fair introduction to uFFI by introducing function call\-outs: calling out an external function. +This chapter presents a fair introduction to uFFI by introducing function call\-outs i.e. calling out to external functions. We start by defining a Pharo uFFI binding to a C function. -This example will guide us to how uFFI manages to find and load libraries, and how it looks up functions in it. -Finally, when executing the binding, the returned value should be transformed to a Pharo object. +This example will guide us through how uFFI manages to find and load libraries, and how it looks up functions therein. +Finally, when executing the binding, the returned value should be transformed into a Pharo object. Such transformation is called marshalling. -In the second part of this chapter, we refactor the initial example to extract the library into a ==FFILibrary== object. -A library object can cope with platform independent library lookup and smarther library searches. +In the second part of this chapter, we refactor the initial example to extract the library into an ==FFILibrary== object. +A library object can cope with platform-independent library lookup and smarter library searches. !!! Calling a simple external function @@ -31,7 +31,7 @@ To call ==clock()== from Pharo using uFFI, we need to define a binding between a uFFI bindings are classes and methods that provide an object\-oriented means of accessing C libraries, implementing all the glue required to join the Pharo world and the C world. To write our first binding, let\'s start by defining a new class, ==FFITutorial==. -This class will act as a module and encapsulate not only the functions we want to call but also any state we would like to persist. +This class will act as a module and encapsulate, not only the functions we want to call, but also any state we would like to persist. To access the ==clock()== function, we then define a method in our ==FFITutorial== class using the ==ffiCall\:library\:== message to specify the declaration of the C function and indicate where it is defined. We will technically refer to this binding as a ''call\-out'', since it ''calls'' a function in the ''outside'' world (the C world). If our Pharo code is hosted on a Linux system, we define this class and (class\-side) method like so\: @@ -86,17 +86,17 @@ FFITutorial class >> ticksSinceStart [ ]]] This call\-out binding, a Pharo method, is called ==ticksSinceStart== and happens to be named differently than the C function we are calling. -Indeed, uFFI does not impose any restrictions as far as how to call your external functions. This can come in handy for decoupling your methods from underlying C\-level implementation details. +Indeed, uFFI does not impose any restrictions on naming your external function wrapper methods. This can come in handy for decoupling your methods from underlying C\-level implementation details. -We invoke the C function using the Pharo method ==ffiCall\:library\:==, which is defined by uFFI. We provide the message arguments it needs usually by just copying and pasting the target C function declaration inside a Pharo Array literal, then referencing the name of the library in which it\'s defined (which in general will depend on our host platform). +We invoke the C function using the Pharo method ==ffiCall\:library\:==, which is defined by uFFI. We provide the message the arguments it needs usually by just copying and pasting the target C function declaration inside a Pharo Array literal, then referencing the name of the library in which it\'s defined (which in general will depend on our host platform). -uFFI interprets the declaration and performs all the necessary work needed to make the call\-out and return the result. In general, -- uFFI searches for the specified library in the host system, -- On finding it, loads the C library into memory, -- Indexes the specified function within the library, -- Transforms and pushes Pharo arguments (if any) onto the stack, -- Performs the call to the C function, -- And finally transforms the return value into a Pharo object. +uFFI interprets the declaration and performs all the necessary work needed to make the call\-out and return the result. In general, uFFI goes through the following steps: +1. Search for the specified library in the host system. +1. On finding it, load the C library into memory. +1. Index the specified function within the library. +1. Transform and push Pharo arguments (if any) onto the stack. +1. Perform the call to the C function. +1. And finally, transform the return value into a Pharo object. To form the first argument in our example, we render our C declaration for ==clock()== in Pharo as a literal string array, like so\: @@ -104,32 +104,32 @@ To form the first argument in our example, we render our C declaration for ==clo #( uint clock() ) ]]] -The first element of our array is ==uint==, which is the function return ''type''. This is followed by the function name, ==clock==. Following the function name, we embed another Pharo Array to list the formal arguments the C function expects, in order. In this case, ==clock()== takes no arguments, so we must provide an empty Array. +The first element of our array is ==uint==, which is the function return ''type''. This is followed by the function name, ==clock==. Following the function name, we embed another Pharo ==Array== to list the formal arguments the C function expects, in order. In this case, ==clock()== takes no arguments, so we must provide an empty ==Array==. -Another way to think of the declaration argument is this\: If we look past the outer ==\#( )== wrapper, what we see inside is our C function prototype, appearing very similar to normal C syntax. This convenience is possible due to the coincidental nature of Smalltalk syntax\: our use of strings and array notation in Pharo nicely mirrors how we write a C function declaration. uFFI was intentionally designed to take advantage of this so that in most cases we can simply copy\-paste a C function declaration taken from a header file or documentation, wrap it in ==\#( )==, and it\'s ready for use\! +Another way to think of the declaration argument is this\: If we look past the outer ==\#( )== wrapper, what we see inside is our C function prototype, appearing very similar to normal C syntax. This convenience is possible due to the coincidental nature of Smalltalk syntax\: our use of strings and array notation in Pharo nicely mirrors how we write a C function declaration. uFFI was intentionally designed to take advantage of this, so that in most cases we can simply copy\-paste a C function declaration taken from a header file or documentation, wrap it in ==\#( )==, and it\'s ready for use\! -Our ==ffiCall\:library\:== message also needs a second argument (==\'libc.so.6\'== in our Linux example), which is the name of the library in our host that contains the function. In many cases we do not need to provide a full path to the file in our host system. However, it should already be apparent that our bindings can be platform dependent if the library we need is also platform dependent. We will explore how to define bindings in a platform\-independent way in a following section. +Our ==ffiCall\:library\:== message also needs a second argument (==\'libc.so.6\'== in our Linux example), which is the name of the library on our host that contains the function. In many cases we do not need to provide a full path to the file in our host system. However, it should already be apparent that our bindings can be platform-dependent if the library we need is also platform dependent. We will explore how to define bindings in a platform\-independent way in a following section. !!! Notes on Value Marshalling -To fully understand the previous example, we still need to explain how the C ==uint== return value (a non\-object\; a cluster of bytes popped off the stack) gets transformed into a Pharo ==SmallInteger== ''object''. Remember, C does not understand objects and does not do us the favor of returning values as attributes encapsulated within an object. We must somehow create an appropriate type of Pharo object, then migrate the C return value to become ''its'' value. Our code then receives this Pharo object. +To fully understand the previous example, we still need to explain how the C ==uint== return value (a non\-object\; a cluster of ''bytes'' popped off the stack) gets transformed into a Pharo ==SmallInteger== ''object''. Remember, C does not understand objects and does not do us the favor of returning values as attributes encapsulated within an object. We must somehow create an appropriate type of Pharo object, then initialize it with the C return value. Our code then receives this Pharo object. This process of converting values between different internal representations is called ''marshalling'', and in most cases is managed automatically in Pharo by uFFI. For example, uFFI internally maps the following standard C values to Smalltalk objects\: -- Types ==int==, ==uint==, ==long==, ==ulong== are marshalled into Pharo integers (small or long integers, depending on the platform architecture). +- Types ==int==, ==uint==, ==long==, ==ulong== are marshalled as Pharo integers (small or long integers, depending on the platform architecture). - Types ==float== and ==double== are marshalled into Pharo floats. -Correct marshalling (and ''demarshalling'') of values is therefore crucial for correct behavior of the bindings, particularly because the C language is so closely tied to underlying machine architecture. And yet, C values are merely \"naked\" bits and bytes in registers and memory\; they have no inherent context or meaning. Consequently, they can be interpreted in many different ways, including by the Pharo run\-time engine. The correct interpretation, involving such issues as byte ordering, type size, alignment requirements, string length/termination, etc. must be knowable, known, and properly handled. An object can tell you what it is, but a string of bits is just a string of bits... +Correct marshalling (and ''demarshalling'') of values is therefore crucial for correct behavior of the bindings, particularly because the C language is so closely tied to underlying machine architecture. And yet, C values are merely \"naked\" bits and bytes in registers and memory\; they have no inherent context or meaning. Consequently, they can be interpreted in many different contexts, including by the Pharo run\-time engine. The correct interpretation, involving such issues as byte ordering, type size, alignment requirements, string length/termination, etc. must be knowable, known, and properly handled. An object can tell you what it is, but a string of bits is just a string of bits... As an example, consider the C integer value 0x00000000 (four contiguous \'0x00\' bytes). This can be interpreted as the small integer zero, as the ''false'' object, or as a null pointer \-\- all depending on the marshalling rule selected for the inferred type. This means that the developer coding the binding method needs to ''carefully'' and ''correctly'' describe the types of argument bindings so uFFI will then correctly interpret and transform those values. This is programming at the ABI level (binary representations), so precision counts\! You are working side\-by\-side with the compiler, and inattention to detail can lead to crashes (or strange behavior that can be difficult to diagnose). -In the following chapters we will explore the marshalling rules more in detail and see how they apply not only for return values but also for arguments. +In the following chapters we will explore the marshalling rules in more detail and see how they apply not only for return values but also for arguments. Moreover, we will learn how to define our own user\-defined data types and type mappings, allowing us to customize and fine\-tune the marshalling rules to fit our particular needs. !!! Libraries -We saw earlier that a call\-out binding requires us to specify a library that uFFI uses to locate and load the desired function. In our previous example, we indicated to Pharo that the ==clock()== function we need was inside the standard C library, namely, the file ==libc.so.6==. However, this form of the library exists in Linux systems, but not in Windows. +We saw earlier that a call\-out binding requires us to specify the library that uFFI should use to locate and load the desired function. In our previous example, we indicated to Pharo that the ==clock()== function was inside the standard C library, namely, the file ==libc.so.6==. However, this form of the library exists in Linux systems, but not in Windows. -So we could say that this solution is not portable enough\: One of the hallmark qualities of Smalltalk is supposed to be platform independence. But if we want to load and run ''this'' code on a different host platform, we are faced with changing the library name to match the name on our new host system. Worse, the libraries we need will all too often not have the same name, nor be located in the same place on all platforms. Not only that, we would need to be sure we catch every instance of these kinds of dependencies when we perform this \"migration\". Ugh\! +So we could say that this solution is not portable enough. After all, one of the hallmark qualities of Smalltalk is supposed to be platform-independence. But if we want to load and run ''this'' code on a different host platform, we are faced with changing the library name to match the name on our new host system. Worse, the libraries we need will all too often not have the same name, nor be located in the same place on all platforms. Not only that, we would need to be sure we catch every instance of these dependencies when we perform this \"migration\". Ugh\! One way to overcome this issue would be to define a set of bindings, one per platform, and decide which one to call based on which platform we detect at run\-time, as follows\: @@ -157,9 +157,9 @@ FFITutorial class >> ticksSinceStartWindows [ ] ]]] -But this solution means our binding code (which is essentially the same in all cases) gets repeated three times, and any changes to the binding design will require changing all three binding methods. This may look simple enough for our ==clock()== binding, but repeating the code of complex bindings is likely not an optimal solution... +But this solution means our binding code (which is essentially the same in all cases) gets repeated three times, and any changes to the binding design will require changing all three binding methods. This may seem okay for our simple ==clock()== binding, but as usage gets more complex, this code duplication will become a problem... -uFFI solves this problem by allowing us to use ''library objects'' instead of plain strings like we did earlier. A library object represents a library as an instance of ==FFILibrary==, abstracting away any platform dependencies. This library class defines methods ==macModuleName==, ==unixModuleName==, and ==win32ModuleName==\; uFFI internally selects the correct library name at run\-time after sensing the host platform. Bonus\: This selection is a ''process'', not a literal (a string), so it can now include behavior, such as the ability to dynamically search through different directories on your host system to locate the correct version of a library, as we will see shortly. +uFFI solves this problem by allowing us to use ''library objects'' instead of plain strings like we did earlier. A library object represents a library as an instance of ==FFILibrary==, abstracting away any platform dependencies. This library class defines methods ==macModuleName==, ==unixModuleName==, and ==win32ModuleName==\; uFFI internally selects the correct library name at run\-time (after sensing the host platform). Bonus\: This selection is a ''process'', not a literal (i.e. a string), so it can now include behavior, such as the ability to dynamically search through different directories on your host system to locate the correct version of a library, as we will see shortly. So for our example, we can now define such a library, ==MyLibC==, as follows (being careful to note that the methods are ''instance side'' overrides)\: @@ -183,7 +183,7 @@ MyLibC >> win32ModuleName [ ] ]]] -To use this improved technique, we modify our ''original'' binding method (in *@OriginalTicksSinceStartBinding*) to substitute our library object (as a class) in place of the library name string\: +To use this improved technique, we modify our original binding method (in *@OriginalTicksSinceStartBinding*) to substitute our library ''object'' (as a class) in place of the library name ''string''\: [[[language=smalltalk FFITutorial class >> ticksSinceStart [ @@ -191,13 +191,13 @@ FFITutorial class >> ticksSinceStart [ ] ]]] -''This'' version will run on all three platform types, ''and'' do so without us having to repeat the same code multiple times. +''This'' version will run on all three platform types, ''and'' do so with each datum specified only once. !!! Library Searching -The ==macModuleName==, ==unixModuleName==, and ==win32ModuleName== methods allow us, as developers, to employ different strategies to search for libraries and functions, depending on our host platform. If these methods return a relative path, library searching starts in common/default library directories on the system, or adjacent to the virtual machine executable. If they return an absolute path, default system locations will not be searched\; only the specified path will be. In either case, if the library is not found or cannot be loaded, an exception is raised. +The ==macModuleName==, ==unixModuleName==, and ==win32ModuleName== methods allow us, as developers, to employ different strategies to search for libraries and functions, depending on our host platform. If these methods return a relative path, library searching starts in common/default library directories on the system, or adjacent to the virtual machine executable. If they return an absolute path, only the specified path will be searched - ''not'' the default system locations. In either case, if the library is not found or cannot be loaded, an exception is raised. -For example, an alternative override for ==unixModuleName== can limit the search for ==libc== to load only from the ==/usr/lib/== directory on the host this way\: +For example, an alternative override for ==unixModuleName== can limit the search for ==libc== to only the host's ==/usr/lib/== directory\: [[[language=smalltalk MyLibC >> unixModuleName [ @@ -207,7 +207,7 @@ MyLibC >> unixModuleName [ Moreover, we are not constrained to simply return a string containing a path. The use of a method allows us to define and follow complex search rules, potentially locating needed libraries dynamically. -To take a real\-world example, let\'s consider where the Cairo graphics library installs its resources on Unix\-type systems. Although they are generally compatible, different \'\*nix\' distros have evolved in ways that occasionally led to divergence in their file system structure, the placement of operating system files, and where they prefer to install packages the user may add. This is especially true (for historical reasons) where structure was added to avoid mixing 32\-bit and 64\-bit libraries. (Unix pre\-dates the 8\-bit micro\-computer age. It may be older than you are\!) +To take a real\-world example, let\'s consider where the Cairo graphics library installs its resources on Unix\-type systems. Although they are generally compatible, different \'\*nix\' distros have evolved in ways that occasionally led to divergence in their file system structure, the placement of operating system files, and where they prefer to install user-added packages. This is especially true (for historical reasons) where structure was added to avoid mixing 32\-bit and 64\-bit libraries. (Unix pre\-dates the 8\-bit micro\-computer age. It may be older than you are\!) In the example below, the Cairo library search method for Linux checks for the existence of the library in each of ==/usr/lib/i386\-linux\-gnu==, ==/usr/lib32==, and ==/usr/lib==, and if found, returns the absolute path to that file\: @@ -240,7 +240,7 @@ FFITutorial class >> time [ ] ]]] -Our new binding references the ==MyLibC== library we defined earlier, so the above structure couples that code to both our bindings. To avoid such undesirable coupling, we can choose to refactor the class reference into a single class method in our ==FFITutorial== class that can be used instead in both bindings. +Our new binding references the ==MyLibC== library we defined earlier, so the above structure couples that code to both our bindings. To avoid such undesirable coupling, we can choose to refactor the class reference into a class method of our ==FFITutorial== class that can be used instead by all such bindings. To continue our example, @@ -260,7 +260,7 @@ FFITutorial class >> time [ This strategy, however, is still not as neat as we would like it to be. Further refactoring could clean this up, but fortunately for us, uFFI provides the support we need for sharing library definitions between bindings. -Any class defining a binding also has the option of defining a default library by overriding the ==ffiLibrary== class method. Doing so allows us to omit a library definition altogether in our call\-out bindings. The library will be automatically referenced by uFFI via the default method definition. +Any class defining a binding has the option of defining a default library by overriding the ==ffiLibrary== class method. Doing so allows us to omit a library definition altogether in our call\-out bindings as this default library will be automatically used by uFFI. Let\'s see how this further simplies things for us\: @@ -282,7 +282,7 @@ Of course, bindings defining a library explicitly will necessarily override this !!! Conclusion -In this chapter we have seen the basics of writing our own uFFI call\-outs. We declare an FFI binding to a C function by specifying the name of the function, its return type, its arguments, and the library the function belongs to. uFFI uses this information to load the library in memory, look up the function, demarshall our Pharo arguments to C types and push them, call the function, and marshall any C return values back into Pharo objects. +In this chapter, we have seen the basics of writing our own uFFI call\-outs. We declare an FFI binding to a C function by specifying the name of the function, its return type, its arguments, and the library to which the function belongs. uFFI uses this information to load the library in memory, look up the function, demarshall our Pharo arguments to C types and push them, call the function, and marshall any C return values back into Pharo objects. Here is the final version of our call\out method: [[[language=smalltalk FFITutorial class >> time [ @@ -290,8 +290,7 @@ FFITutorial class >> time [ ] ]]] -Since different platforms work differently, uFFI provides extensions to define a library as an object. -Library objects define per\-platform strategies to search for C libraries in the host file system. By specifying relative paths we let uFFI search for the library in a platform\'s standard locations, while absolute paths override such behavior. In addition, this mechanism allows developers to write bindings that can dynamically search for their libraries in multiple locations. +As we can see, uFFI's ability to work with libraries as objects has protected our code from the distraction of platform implementation differences. The next chapter covers function arguments of various types. Although we glossed over its details on purpose, the ==time== binding described in the previous section has a literal ==NULL== pointer argument. We will see how literal arguments, which may be of different flavors, are very convenient syntactic sugar for specifying default argument values.