Dec 18, 2018 - 使用C或C++扩展Python

https://docs.python.org/dev/extending/extending.html

  1. Extending Python with C or C++ 1.用C或C+扩展Python

It is quite easy to add new built-in modules to Python, if you know how to program in C. Such extension modules can do two things that can’t be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls. 如果您知道如何用C编程,那么在Python中添加新的内置模块是相当容易的。扩展模块可以做两件不能在Python中直接完成的事情:它们可以实现新的内置对象类型,也可以调用C库函数和系统调用

To support extensions, the Python API (Application Programmers Interface) defines a set of functions, macros and variables that provide access to most aspects of the Python run-time system. The Python API is incorporated in a C source file by including the header “Python.h”. 为了支持扩展,PythonAPI(应用程序编程器接口)定义了一组函数、宏和变量,这些函数、宏和变量提供了对Python运行时系统大部分的访问。PythonAPI是通过包含报头而合并到C源文件中的。”Python.h”.

The compilation of an extension module depends on its intended use as well as on your system setup; details are given in later chapters. 扩展模块的编译取决于它的预期用途以及您的系统设置;详细信息将在后面的章节中给出。

1.1. A Simple Example 1.1.一个简单的例子

Let’s create an extension module called spam (the favorite food of Monty Python fans…) and let’s say we want to create a Python interface to the C library function system() [1]. This function takes a null-terminated character string as argument and returns an integer. We want this function to be callable from Python as follows: 让我们创建一个扩展模块,名为spam(MontyPythonFans…最喜欢的食物)假设我们想要为C库函数创建一个Python接口system() [1]。此函数以一个以空结尾的字符串作为参数,并返回一个整数。我们希望这个函数可以从Python中调用,如下所示:

>>> import spam
>>> status = spam.system("ls -l")

Begin by creating a file spammodule.c. (Historically, if a module is called spam, the C file containing its implementation is called spammodule.c; if the module name is very long, like spammify, the module name can be just spammify.c.) The first line of our file can be: 首先创建一个文件spammodule.c。(习惯上,如果一个模块被调用spam,包含其实现的C文件称为spammodule.c;如果模块名称很长,如spammify,模块名可以是spammify.c.) 我们文件的第一行可以是:

#include <Python.h>

which pulls in the Python API (you can add a comment describing the purpose of the module and a copyright notice if you like). 它引入PythonAPI(您可以添加描述模块用途的注释和版权声明(如果您愿意)。

Note Since Python may define some pre-processor definitions which affect the standard headers on some systems, you must include Python.h before any standard headers are included. 注 由于Python可能会定义一些影响某些系统上标准头的预处理定义,所以您必包括Python.h在包含任何标准标头之前。

All user-visible symbols defined by Python.h have a prefix of Py or PY, except those defined in standard header files. For convenience, and since they are used extensively by the Python interpreter, “Python.h” includes a few standard header files: , , , and . If the latter header file does not exist on your system, it declares the functions malloc(), free() and realloc() directly. 定义的所有用户可见符号Python.h前缀为Py或PY,但在标准头文件中定义的除外。为了方便起见,由于Python解释器广泛使用它们,"Python.h"包括一些标准的头文件:, , ,和。如果系统上不存在后一个头文件,将直接声明以下函数malloc(), free()和realloc()。

The next thing we add to our module file is the C function that will be called when the Python expression spam.system(string) is evaluated (we’ll see shortly how it ends up being called): 接下来我们添加到模块文件中的是在Python表达式中调用的C函数spam.system(string)(我们将很快看到它是如何被调用的):

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    return PyLong_FromLong(sts);
}

There is a straightforward translation from the argument list in Python (for example, the single expression “ls -l”) to the arguments passed to the C function. The C function always has two arguments, conventionally named self and args.

The self argument points to the module object for module-level functions; for a method it would point to the object instance.

The args argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects — in order to do anything with them in our C function we have to convert them to C values. The function PyArg_ParseTuple() in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values. More about this later. args参数将是指向包含参数的Python元组对象的指针。元组的每个项对应于调用的参数列表中的参数。参数是Python对象——为了在C函数中使用它们,我们必须将它们转换为C值。Python API中的函数PyArg_ParseTuple()检查参数类型并将它们转换为C值。它使用模板字符串来确定所需的参数类型以及存储转换后的值的C变量的类型。稍后再详细介绍。

PyArg_ParseTuple() returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception so the calling function can return NULL immediately (as we saw in the example).

1.2. Intermezzo: Errors and Exceptions 1.2. 插曲:错误和异常 An important convention throughout the Python interpreter is the following: when a function fails, it should set an exception condition and return an error value (usually a NULL pointer). Exceptions are stored in a static global variable inside the interpreter; if this variable is NULL no exception has occurred. A second global variable stores the “associated value” of the exception (the second argument to raise). A third variable contains the stack traceback in case the error originated in Python code. These three variables are the C equivalents of the result in Python of sys.exc_info() (see the section on module sys in the Python Library Reference). It is important to know about them to understand how errors are passed around. 贯穿Python解释器的一个重要约定是:当一个函数失败时,它应该设置一个异常条件并返回一个错误值(通常是一个NULL指针)。异常存储在解释器内部的静态全局变量中;如果该变量为NULL,则没有发生异常。第二个全局变量存储异常的“关联值”(要引发的第二个参数)。第三个变量包含堆栈回溯,以防错误源自Python代码。这三个变量是sys.exc_info()的Python中的结果的C等价物(参见Python Library Reference中关于模块sys的部分)。了解它们是很重要的,以便理解错误是如何传递的。

The Python API defines a number of functions to set various types of exceptions.

The most common one is PyErr_SetString(). Its arguments are an exception object and a C string. The exception object is usually a predefined object like PyExc_ZeroDivisionError. The C string indicates the cause of the error and is converted to a Python string object and stored as the “associated value” of the exception.

Another useful function is PyErr_SetFromErrno(), which only takes an exception argument and constructs the associated value by inspection of the global variable errno. The most general function is PyErr_SetObject(), which takes two object arguments, the exception and its associated value. You don’t need to Py_INCREF() the objects passed to any of these functions.

You can test non-destructively whether an exception has been set with PyErr_Occurred(). This returns the current exception object, or NULL if no exception has occurred. You normally don’t need to call PyErr_Occurred() to see whether an error occurred in a function call, since you should be able to tell from the return value.

When a function f that calls another function g detects that the latter fails, f should itself return an error value (usually NULL or -1). It should not call one of the PyErr_() functions — one has already been called by g. f’s caller is then supposed to also return an error indication to its caller, again without calling PyErr_(), and so on — the most detailed cause of the error was already reported by the function that first detected it. Once the error reaches the Python interpreter’s main loop, this aborts the currently executing Python code and tries to find an exception handler specified by the Python programmer.

(There are situations where a module can actually give a more detailed error message by calling another PyErr_*() function, and in such cases it is fine to do so. As a general rule, however, this is not necessary, and can cause information about the cause of the error to be lost: most operations can fail for a variety of reasons.)

To ignore an exception set by a function call that failed, the exception condition must be cleared explicitly by calling PyErr_Clear(). The only time C code should call PyErr_Clear() is if it doesn’t want to pass the error on to the interpreter but wants to handle it completely by itself (possibly by trying something else, or pretending nothing went wrong).

Every failing malloc() call must be turned into an exception — the direct caller of malloc() (or realloc()) must call PyErr_NoMemory() and return a failure indicator itself. All the object-creating functions (for example, PyLong_FromLong()) already do this, so this note is only relevant to those who call malloc() directly.

Also note that, with the important exception of PyArg_ParseTuple() and friends, functions that return an integer status usually return a positive value or zero for success and -1 for failure, like Unix system calls.

Finally, be careful to clean up garbage (by making Py_XDECREF() or Py_DECREF() calls for objects you have already created) when you return an error indicator!

The choice of which exception to raise is entirely yours. There are predeclared C objects corresponding to all built-in Python exceptions, such as PyExc_ZeroDivisionError, which you can use directly. Of course, you should choose exceptions wisely — don’t use PyExc_TypeError to mean that a file couldn’t be opened (that should probably be PyExc_IOError). If something’s wrong with the argument list, the PyArg_ParseTuple() function usually raises PyExc_TypeError. If you have an argument whose value must be in a particular range or must satisfy other conditions, PyExc_ValueError is appropriate.

You can also define a new exception that is unique to your module. For this, you usually declare a static object variable at the beginning of your file: 还可以定义模块特有的新异常。为此,通常在文件开头声明一个静态对象变量:

static PyObject *SpamError;

and initialize it in your module’s initialization function (PyInit_spam()) with an exception object (leaving out the error checking for now): 并在模块的初始化函数中初始化它(PyInit_spam())有一个异常对象(暂时省略错误检查):

PyMODINIT_FUNC
PyInit_spam(void)
{
    PyObject *m;

    m = PyModule_Create(&spammodule);
    if (m == NULL)
        return NULL;

    SpamError = PyErr_NewException("spam.error", NULL, NULL);
    Py_INCREF(SpamError);
    PyModule_AddObject(m, "error", SpamError);
    return m;
}

Note that the Python name for the exception object is spam.error. The PyErr_NewException() function may create a class with the base class being Exception (unless another class is passed in instead of NULL), described in Built-in Exceptions.

Note also that the SpamError variable retains a reference to the newly created exception class; this is intentional! Since the exception could be removed from the module by external code, an owned reference to the class is needed to ensure that it will not be discarded, causing SpamError to become a dangling pointer. Should it become a dangling pointer, C code which raises the exception could cause a core dump or other unintended side effects.

We discuss the use of PyMODINIT_FUNC as a function return type later in this sample.

The spam.error exception can be raised in your extension module using a call to PyErr_SetString() as shown below: spam.error可以在扩展模块中使用调用引发异常。PyErr_SetString()如下所示:

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    if (sts < 0) {
        PyErr_SetString(SpamError, "System command failed");
        return NULL;
    }
    return PyLong_FromLong(sts);
}

1.3. Back to the Example 1.3.回到例子 Going back to our example function, you should now be able to understand this statement:

if (!PyArg_ParseTuple(args, "s", &command))
    return NULL;

It returns NULL (the error indicator for functions returning object pointers) if an error is detected in the argument list, relying on the exception set by PyArg_ParseTuple(). Otherwise the string value of the argument has been copied to the local variable command. This is a pointer assignment and you are not supposed to modify the string to which it points (so in Standard C, the variable command should properly be declared as const char *command). 如果在参数列表中检测到错误,依赖于PyArg_ParseTuple()它返回NULL(返回对象指针的函数的错误指示器)。否则,参数的字符串值已复制到局部变量command中。这是一个指针赋值,您不应该修改它所指向的字符串(因此在标准C中,command应正确地声明为const char *command).

The next statement is a call to the Unix function system(), passing it the string we just got from PyArg_ParseTuple(): 下一个语句是对unix函数的调用system(),把从PyArg_ParseTuple()得到字符串传给它:

sts = system(command);

Our spam.system() function must return the value of sts as a Python object. This is done using the function PyLong_FromLong().

return PyLong_FromLong(sts);

In this case, it will return an integer object. (Yes, even integers are objects on the heap in Python!)

If you have a C function that returns no useful argument (a function returning void), the corresponding Python function must return None. You need this idiom to do so (which is implemented by the Py_RETURN_NONE macro):

Py_INCREF(Py_None);
return Py_None;

Py_None is the C name for the special Python object None. It is a genuine Python object rather than a NULL pointer, which means “error” in most contexts, as we have seen.

1.4. The Module’s Method Table and Initialization Function 1.4.模块的方法表及初始化函数 I promised to show how spam_system() is called from Python programs. First, we need to list its name and address in a “method table”: 实现spam_system()从Python程序调用,首先我们需要在“方法表”中列出它的名称和地址:

static PyMethodDef SpamMethods[] = {
    ...
    {"system",  spam_system, METH_VARARGS,
     "Execute a shell command."},
    ...
    {NULL, NULL, 0, NULL}        /* Sentinel */
};

Note the third entry (METH_VARARGS). This is a flag telling the interpreter the calling convention to be used for the C function. It should normally always be METH_VARARGS or METH_VARARGS | METH_KEYWORDS; a value of 0 means that an obsolete variant of PyArg_ParseTuple() is used. 请注意第三项(METH_VARARGS)这是一个标志,告诉解释器使用C函数的调用约定。通常情况下应该是METH_VARARGS或METH_VARARGS | METH_KEYWORDS;0意味着一个过时的变体PyArg_ParseTuple()被利用了。

When using only METH_VARARGS, the function should expect the Python-level parameters to be passed in as a tuple acceptable for parsing via PyArg_ParseTuple(); more information on this function is provided below. 仅使用时METH_VARARGS,该函数应该期望将Python级别的参数作为可接受的元组传入,以便通过PyArg_ParseTuple()方式进行解析。有关这一职能的更多信息见下文。

The METH_KEYWORDS bit may be set in the third field if keyword arguments should be passed to the function. In this case, the C function should accept a third PyObject * parameter which will be a dictionary of keywords. Use PyArg_ParseTupleAndKeywords() to parse the arguments to such a function.

The method table must be referenced in the module definition structure:

static struct PyModuleDef spammodule = { PyModuleDef_HEAD_INIT, “spam”, /* name of module / spam_doc, / module documentation, may be NULL / -1, / size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */ SpamMethods }; This structure, in turn, must be passed to the interpreter in the module’s initialization function. The initialization function must be named PyInit_name(), where name is the name of the module, and should be the only non-static item defined in the module file:

PyMODINIT_FUNC PyInit_spam(void) { return PyModule_Create(&spammodule); } Note that PyMODINIT_FUNC declares the function as PyObject * return type, declares any special linkage declarations required by the platform, and for C++ declares the function as extern “C”.

When the Python program imports module spam for the first time, PyInit_spam() is called. (See below for comments about embedding Python.) It calls PyModule_Create(), which returns a module object, and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDef structures) found in the module definition. PyModule_Create() returns a pointer to the module object that it creates. It may abort with a fatal error for certain errors, or return NULL if the module could not be initialized satisfactorily. The init function must return the module object to its caller, so that it then gets inserted into sys.modules. 当Python程序第一次导入模块垃圾邮件时,会调用PyInit_spam()。(有关嵌入Python的评论,请参阅下面。)它调用PyModule_Create(),它返回一个模块对象,并根据在模块定义中找到的表(PyMethodDef结构的数组)将内置函数对象插入到新创建的模块中。PyModule_Create()返回指向它所创建的模块对象的指针。对于某些错误,它可能以致命错误中止,或者如果模块不能令人满意地初始化,则返回NULL。init函数必须将模块对象返回给其调用者,以便将其插入sys.modules。

When embedding Python, the PyInit_spam() function is not called automatically unless there’s an entry in the PyImport_Inittab table. To add the module to the initialization table, use PyImport_AppendInittab(), optionally followed by an import of the module: 在嵌入Python时,除非PyImport_Inittab表中有条目,否则不会自动调用PyInit_spam()函数。若要将模块添加到初始化表中,请使用PyBuxIyAppDeNITITABLE(),可选地后跟模块的导入:

int
main(int argc, char *argv[])
{
    wchar_t *program = Py_DecodeLocale(argv[0], NULL);
    if (program == NULL) {
        fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
        exit(1);
    }

    /* Add a built-in module, before Py_Initialize */
    PyImport_AppendInittab("spam", PyInit_spam);

    /* Pass argv[0] to the Python interpreter */
    Py_SetProgramName(program);

    /* Initialize the Python interpreter.  Required. */
    Py_Initialize();

    /* Optionally import the module; alternatively,
       import can be deferred until the embedded script
       imports it. */
    PyImport_ImportModule("spam");

    ...

    PyMem_RawFree(program);
    return 0;
}

Note Removing entries from sys.modules or importing compiled modules into multiple interpreters within a process (or following a fork() without an intervening exec()) can create problems for some extension modules. Extension module authors should exercise caution when initializing internal data structures. A more substantial example module is included in the Python source distribution as Modules/xxmodule.c. This file may be used as a template or simply read as an example.

Note Unlike our spam example, xxmodule uses multi-phase initialization (new in Python 3.5), where a PyModuleDef structure is returned from PyInit_spam, and creation of the module is left to the import machinery. For details on multi-phase initialization, see PEP 489. 1.5. Compilation and Linkage There are two more things to do before you can use your new extension: compiling and linking it with the Python system. If you use dynamic loading, the details may depend on the style of dynamic loading your system uses; see the chapters about building extension modules (chapter Building C and C++ Extensions) and additional information that pertains only to building on Windows (chapter Building C and C++ Extensions on Windows) for more information about this.

If you can’t use dynamic loading, or if you want to make your module a permanent part of the Python interpreter, you will have to change the configuration setup and rebuild the interpreter. Luckily, this is very simple on Unix: just place your file (spammodule.c for example) in the Modules/ directory of an unpacked source distribution, add a line to the file Modules/Setup.local describing your file:

spam spammodule.o and rebuild the interpreter by running make in the toplevel directory. You can also run make in the Modules/ subdirectory, but then you must first rebuild Makefile there by running ‘make Makefile’. (This is necessary each time you change the Setup file.)

If your module requires additional libraries to link with, these can be listed on the line in the configuration file as well, for instance:

spam spammodule.o -lX11

1.6. Calling Python Functions from C 1.6. 从c调用python函数 So far we have concentrated on making C functions callable from Python. The reverse is also useful: calling Python functions from C. This is especially the case for libraries that support so-called “callback” functions. If a C interface makes use of callbacks, the equivalent Python often needs to provide a callback mechanism to the Python programmer; the implementation will require calling the Python callback functions from a C callback. Other uses are also imaginable. 到目前为止,我们一直致力于使C函数可从Python调用。反过来也是有用的:从C调用Python函数。对于支持所谓的“回调”函数的库尤其如此。如果C接口使用回调,则等效的Python通常需要向Python程序员提供回调机制;实现将需要从C回调调用Python回调函数。其他用途也是可以想象的。

Fortunately, the Python interpreter is easily called recursively, and there is a standard interface to call a Python function. (I won’t dwell on how to call the Python parser with a particular string as input — if you’re interested, have a look at the implementation of the -c command line option in Modules/main.c from the Python source code.) 幸运的是,Python解释器很容易被递归调用,并且有一个标准接口来调用Python函数。(我不会详细讨论如何用特定的字符串作为输入来调用Python解析器——如果您感兴趣的话,可以从Python源代码中查看Modules/main.c中的-c命令行选项的实现。)

Calling a Python function is easy. First, the Python program must somehow pass you the Python function object. You should provide a function (or some other interface) to do this. When this function is called, save a pointer to the Python function object (be careful to Py_INCREF() it!) in a global variable — or wherever you see fit. For example, the following function might be part of a module definition: 调用Python函数很简单。首先,Python程序必须以某种方式传递Python函数对象。您应该提供一个函数(或其他接口)来做到这一点。调用此函数时,保存指向Python函数对象的指针(注意Py_INCREF()it!)在一个全局变量中,或者您认为合适的地方。例如,以下函数可能是模块定义的一部分:

static PyObject *my_callback = NULL;

static PyObject *
my_set_callback(PyObject *dummy, PyObject *args)
{
    PyObject *result = NULL;
    PyObject *temp;

    if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
        if (!PyCallable_Check(temp)) {
            PyErr_SetString(PyExc_TypeError, "parameter must be callable");
            return NULL;
        }
        Py_XINCREF(temp);         /* Add a reference to new callback */
        Py_XDECREF(my_callback);  /* Dispose of previous callback */
        my_callback = temp;       /* Remember new callback */
        /* Boilerplate to return "None" */
        Py_INCREF(Py_None);
        result = Py_None;
    }
    return result;
}

This function must be registered with the interpreter using the METH_VARARGS flag; this is described in section The Module’s Method Table and Initialization Function. The PyArg_ParseTuple() function and its arguments are documented in section Extracting Parameters in Extension Functions. 这个函数必须使用METH_VARARGS标志向解释器注册;这在“模块的方法表和初始化函数”一节中进行了描述。PyArg_ParseTuple()函数及其参数在“提取扩展函数中的参数”一节中有详细说明。

The macros Py_XINCREF() and Py_XDECREF() increment/decrement the reference count of an object and are safe in the presence of NULL pointers (but note that temp will not be NULL in this context). More info on them in section Reference Counts. 宏Py_XINCREF()和Py_XDECREF()递增/递减对象的引用计数,并且在存在NULL指针的情况下是安全的(但是注意,在此上下文中temp不是NULL)。参考计数一节中有关它们的更多信息。

Later, when it is time to call the function, you call the C function PyObject_CallObject(). This function has two arguments, both pointers to arbitrary Python objects: the Python function, and the argument list. The argument list must always be a tuple object, whose length is the number of arguments. To call the Python function with no arguments, pass in NULL, or an empty tuple; to call it with one argument, pass a singleton tuple. Py_BuildValue() returns a tuple when its format string consists of zero or more format codes between parentheses. For example: 稍后,当调用函数时,调用C函数PyObject_CallObject()。这个函数有两个参数,指向任意Python对象的指针:Python函数和参数列表。参数列表必须始终是元组对象,其长度是参数的数量。要调用没有参数的Python函数,传入NULL或空元组;要用一个参数调用它,传入单例元组。当Py_BuildValue()的格式字符串由括号之间的零个或多个格式代码组成时,它返回一个元组。例如:

int arg;
PyObject *arglist;
PyObject *result;
...
arg = 123;
...

/* Time to call the callback */
arglist = Py_BuildValue("(i)", arg);
result = PyObject_CallObject(my_callback, arglist);
Py_DECREF(arglist);

PyObject_CallObject() returns a Python object pointer: this is the return value of the Python function. PyObject_CallObject() is “reference-count-neutral” with respect to its arguments. In the example a new tuple was created to serve as the argument list, which is Py_DECREF()-ed immediately after the PyObject_CallObject() call.

The return value of PyObject_CallObject() is “new”: either it is a brand new object, or it is an existing object whose reference count has been incremented. So, unless you want to save it in a global variable, you should somehow Py_DECREF() the result, even (especially!) if you are not interested in its value.

Before you do this, however, it is important to check that the return value isn’t NULL. If it is, the Python function terminated by raising an exception. If the C code that called PyObject_CallObject() is called from Python, it should now return an error indication to its Python caller, so the interpreter can print a stack trace, or the calling Python code can handle the exception. If this is not possible or desirable, the exception should be cleared by calling PyErr_Clear(). For example:

if (result == NULL) return NULL; /* Pass error back */ …use result… Py_DECREF(result); Depending on the desired interface to the Python callback function, you may also have to provide an argument list to PyObject_CallObject(). In some cases the argument list is also provided by the Python program, through the same interface that specified the callback function. It can then be saved and used in the same manner as the function object. In other cases, you may have to construct a new tuple to pass as the argument list. The simplest way to do this is to call Py_BuildValue(). For example, if you want to pass an integral event code, you might use the following code:

PyObject arglist; … arglist = Py_BuildValue(“(l)”, eventcode); result = PyObject_CallObject(my_callback, arglist); Py_DECREF(arglist); if (result == NULL) return NULL; / Pass error back / / Here maybe use the result */ Py_DECREF(result); Note the placement of Py_DECREF(arglist) immediately after the call, before the error check! Also note that strictly speaking this code is not complete: Py_BuildValue() may run out of memory, and this should be checked.

You may also call a function with keyword arguments by using PyObject_Call(), which supports arguments and keyword arguments. As in the above example, we use Py_BuildValue() to construct the dictionary.

PyObject dict; … dict = Py_BuildValue(“{s:i}”, “name”, val); result = PyObject_Call(my_callback, NULL, dict); Py_DECREF(dict); if (result == NULL) return NULL; / Pass error back / / Here maybe use the result */ Py_DECREF(result); 1.7. Extracting Parameters in Extension Functions The PyArg_ParseTuple() function is declared as follows:

int PyArg_ParseTuple(PyObject *arg, const char *format, …); The arg argument must be a tuple object containing an argument list passed from Python to a C function. The format argument must be a format string, whose syntax is explained in Parsing arguments and building values in the Python/C API Reference Manual. The remaining arguments must be addresses of variables whose type is determined by the format string.

Note that while PyArg_ParseTuple() checks that the Python arguments have the required types, it cannot check the validity of the addresses of C variables passed to the call: if you make mistakes there, your code will probably crash or at least overwrite random bits in memory. So be careful!

Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

Some example calls:

#define PY_SSIZE_T_CLEAN /* Make “s#” use Py_ssize_t rather than int. */ #include int ok; int i, j; long k, l; const char *s; Py_ssize_t size;

ok = PyArg_ParseTuple(args, “”); /* No arguments / / Python call: f() / ok = PyArg_ParseTuple(args, “s”, &s); / A string / / Possible Python call: f(‘whoops!’) / ok = PyArg_ParseTuple(args, “lls”, &k, &l, &s); / Two longs and a string / / Possible Python call: f(1, 2, ‘three’) / ok = PyArg_ParseTuple(args, “(ii)s#”, &i, &j, &s, &size); / A pair of ints and a string, whose size is also returned / / Possible Python call: f((1, 2), ‘three’) / { const char *file; const char *mode = “r”; int bufsize = 0; ok = PyArg_ParseTuple(args, “s|si”, &file, &mode, &bufsize); / A string, and optionally another string and an integer / / Possible Python calls: f(‘spam’) f(‘spam’, ‘w’) f(‘spam’, ‘wb’, 100000) / } { int left, top, right, bottom, h, v; ok = PyArg_ParseTuple(args, “((ii)(ii))(ii)”, &left, &top, &right, &bottom, &h, &v); / A rectangle and a point / / Possible Python call: f(((0, 0), (400, 300)), (10, 10)) / } { Py_complex c; ok = PyArg_ParseTuple(args, “D:myfunction”, &c); / a complex, also providing a function name for errors / / Possible Python call: myfunction(1+2j) */ } 1.8. Keyword Parameters for Extension Functions The PyArg_ParseTupleAndKeywords() function is declared as follows:

int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict, const char *format, char *kwlist[], …); The arg and format parameters are identical to those of the PyArg_ParseTuple() function. The kwdict parameter is the dictionary of keywords received as the third parameter from the Python runtime. The kwlist parameter is a NULL-terminated list of strings which identify the parameters; the names are matched with the type information from format from left to right. On success, PyArg_ParseTupleAndKeywords() returns true, otherwise it returns false and raises an appropriate exception.

Note Nested tuples cannot be parsed when using keyword arguments! Keyword parameters passed in which are not present in the kwlist will cause TypeError to be raised. Here is an example module which uses keywords, based on an example by Geoff Philbrick (philbrick@hks.com):

#include “Python.h”

static PyObject * keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds) { int voltage; const char *state = “a stiff”; const char *action = “voom”; const char *type = “Norwegian Blue”;

static char *kwlist[] = {"voltage", "state", "action", "type", NULL};

if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
                                 &voltage, &state, &action, &type))
    return NULL;

printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
       action, voltage);
printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);

Py_RETURN_NONE; }

static PyMethodDef keywdarg_methods[] = { /* The cast of the function is necessary since PyCFunction values * only take two PyObject* parameters, and keywdarg_parrot() takes * three. / {“parrot”, (PyCFunction)(void()(void))keywdarg_parrot, METH_VARARGS | METH_KEYWORDS, “Print a lovely skit to standard output.”}, {NULL, NULL, 0, NULL} /* sentinel */ };

static struct PyModuleDef keywdargmodule = { PyModuleDef_HEAD_INIT, “keywdarg”, NULL, -1, keywdarg_methods };

PyMODINIT_FUNC PyInit_keywdarg(void) { return PyModule_Create(&keywdargmodule); } 1.9. Building Arbitrary Values This function is the counterpart to PyArg_ParseTuple(). It is declared as follows:

PyObject *Py_BuildValue(const char *format, …); It recognizes a set of format units similar to the ones recognized by PyArg_ParseTuple(), but the arguments (which are input to the function, not output) must not be pointers, just values. It returns a new Python object, suitable for returning from a C function called from Python.

One difference with PyArg_ParseTuple(): while the latter requires its first argument to be a tuple (since Python argument lists are always represented as tuples internally), Py_BuildValue() does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns None; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string.

Examples (to the left the call, to the right the resulting Python value):

Py_BuildValue(“”) None Py_BuildValue(“i”, 123) 123 Py_BuildValue(“iii”, 123, 456, 789) (123, 456, 789) Py_BuildValue(“s”, “hello”) ‘hello’ Py_BuildValue(“y”, “hello”) b’hello’ Py_BuildValue(“ss”, “hello”, “world”) (‘hello’, ‘world’) Py_BuildValue(“s#”, “hello”, 4) ‘hell’ Py_BuildValue(“y#”, “hello”, 4) b’hell’ Py_BuildValue(“()”) () Py_BuildValue(“(i)”, 123) (123,) Py_BuildValue(“(ii)”, 123, 456) (123, 456) Py_BuildValue(“(i,i)”, 123, 456) (123, 456) Py_BuildValue(“[i,i]”, 123, 456) [123, 456] Py_BuildValue(“{s:i,s:i}”, “abc”, 123, “def”, 456) {‘abc’: 123, ‘def’: 456} Py_BuildValue(“((ii)(ii)) (ii)”, 1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6)) 1.10. Reference Counts In languages like C or C++, the programmer is responsible for dynamic allocation and deallocation of memory on the heap. In C, this is done using the functions malloc() and free(). In C++, the operators new and delete are used with essentially the same meaning and we’ll restrict the following discussion to the C case.

Every block of memory allocated with malloc() should eventually be returned to the pool of available memory by exactly one call to free(). It is important to call free() at the right time. If a block’s address is forgotten but free() is not called for it, the memory it occupies cannot be reused until the program terminates. This is called a memory leak. On the other hand, if a program calls free() for a block and then continues to use the block, it creates a conflict with re-use of the block through another malloc() call. This is called using freed memory. It has the same bad consequences as referencing uninitialized data — core dumps, wrong results, mysterious crashes.

Common causes of memory leaks are unusual paths through the code. For instance, a function may allocate a block of memory, do some calculation, and then free the block again. Now a change in the requirements for the function may add a test to the calculation that detects an error condition and can return prematurely from the function. It’s easy to forget to free the allocated memory block when taking this premature exit, especially when it is added later to the code. Such leaks, once introduced, often go undetected for a long time: the error exit is taken only in a small fraction of all calls, and most modern machines have plenty of virtual memory, so the leak only becomes apparent in a long-running process that uses the leaking function frequently. Therefore, it’s important to prevent leaks from happening by having a coding convention or strategy that minimizes this kind of errors.

Since Python makes heavy use of malloc() and free(), it needs a strategy to avoid memory leaks as well as the use of freed memory. The chosen method is called reference counting. The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed.

An alternative strategy is called automatic garbage collection. (Sometimes, reference counting is also referred to as a garbage collection strategy, hence my use of “automatic” to distinguish the two.) The big advantage of automatic garbage collection is that the user doesn’t need to call free() explicitly. (Another claimed advantage is an improvement in speed or memory usage — this is no hard fact however.) The disadvantage is that for C, there is no truly portable automatic garbage collector, while reference counting can be implemented portably (as long as the functions malloc() and free() are available — which the C Standard guarantees). Maybe some day a sufficiently portable automatic garbage collector will be available for C. Until then, we’ll have to live with reference counts.

While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself.

The cycle detector is able to detect garbage cycles and can reclaim them. The gc module exposes a way to run the detector (the collect() function), as well as configuration interfaces and the ability to disable the detector at runtime. The cycle detector is considered an optional component; though it is included by default, it can be disabled at build time using the –without-cycle-gc option to the configure script on Unix platforms (including Mac OS X). If the cycle detector is disabled in this way, the gc module will not be available.

1.10.1. Reference Counting in Python There are two macros, Py_INCREF(x) and Py_DECREF(x), which handle the incrementing and decrementing of the reference count. Py_DECREF() also frees the object when the count reaches zero. For flexibility, it doesn’t call free() directly — rather, it makes a call through a function pointer in the object’s type object. For this purpose (and others), every object also contains a pointer to its type object.

The big question now remains: when to use Py_INCREF(x) and Py_DECREF(x)? Let’s first introduce some terms. Nobody “owns” an object; however, you can own a reference to an object. An object’s reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling Py_DECREF() when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call Py_DECREF(). Forgetting to dispose of an owned reference creates a memory leak.

It is also possible to borrow [2] a reference to an object. The borrower of a reference should not call Py_DECREF(). The borrower must not hold on to the object longer than the owner from which it was borrowed. Using a borrowed reference after the owner has disposed of it risks using freed memory and should be avoided completely [3].

The advantage of borrowing over owning a reference is that you don’t need to take care of disposing of the reference on all possible paths through the code — in other words, with a borrowed reference you don’t run the risk of leaking when a premature exit is taken. The disadvantage of borrowing over owning is that there are some subtle situations where in seemingly correct code a borrowed reference can be used after the owner from which it was borrowed has in fact disposed of it.

A borrowed reference can be changed into an owned reference by calling Py_INCREF(). This does not affect the status of the owner from which the reference was borrowed — it creates a new owned reference, and gives full owner responsibilities (the new owner must dispose of the reference properly, as well as the previous owner).

1.10.2. Ownership Rules Whenever an object reference is passed into or out of a function, it is part of the function’s interface specification whether ownership is transferred with the reference or not.

Most functions that return a reference to an object pass on ownership with the reference. In particular, all functions whose function it is to create a new object, such as PyLong_FromLong() and Py_BuildValue(), pass ownership to the receiver. Even if the object is not actually new, you still receive ownership of a new reference to that object. For instance, PyLong_FromLong() maintains a cache of popular values and can return a reference to a cached item.

Many functions that extract objects from other objects also transfer ownership with the reference, for instance PyObject_GetAttrString(). The picture is less clear, here, however, since a few common routines are exceptions: PyTuple_GetItem(), PyList_GetItem(), PyDict_GetItem(), and PyDict_GetItemString() all return references that you borrow from the tuple, list or dictionary.

The function PyImport_AddModule() also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in sys.modules.

When you pass an object reference into another function, in general, the function borrows the reference from you — if it needs to store it, it will use Py_INCREF() to become an independent owner. There are exactly two important exceptions to this rule: PyTuple_SetItem() and PyList_SetItem(). These functions take over ownership of the item passed to them — even if they fail! (Note that PyDict_SetItem() and friends don’t take over ownership — they are “normal.”)

When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed on, it must be turned into an owned reference by calling Py_INCREF().

The object reference returned from a C function that is called from Python must be an owned reference — ownership is transferred from the function to its caller.

1.10.3. Thin Ice There are a few situations where seemingly harmless use of a borrowed reference can lead to problems. These all have to do with implicit invocations of the interpreter, which can cause the owner of a reference to dispose of it.

The first and most important case to know about is using Py_DECREF() on an unrelated object while borrowing a reference to a list item. For instance:

void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0);

PyList_SetItem(list, 1, PyLong_FromLong(0L));
PyObject_Print(item, stdout, 0); /* BUG! */ } This function first borrows a reference to list[0], then replaces list[1] with the value 0, and finally prints the borrowed reference. Looks harmless, right? But it’s not!

Let’s follow the control flow into PyList_SetItem(). The list owns references to all its items, so when item 1 is replaced, it has to dispose of the original item 1. Now let’s suppose the original item 1 was an instance of a user-defined class, and let’s further suppose that the class defined a del() method. If this class instance has a reference count of 1, disposing of it will call its del() method.

Since it is written in Python, the del() method can execute arbitrary Python code. Could it perhaps do something to invalidate the reference to item in bug()? You bet! Assuming that the list passed into bug() is accessible to the del() method, it could execute a statement to the effect of del list[0], and assuming this was the last reference to that object, it would free the memory associated with it, thereby invalidating item.

The solution, once you know the source of the problem, is easy: temporarily increment the reference count. The correct version of the function reads:

void no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0);

Py_INCREF(item);
PyList_SetItem(list, 1, PyLong_FromLong(0L));
PyObject_Print(item, stdout, 0);
Py_DECREF(item); } This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his __del__() methods would fail…

The second case of problems with a borrowed reference is a variant involving threads. Normally, multiple threads in the Python interpreter can’t get in each other’s way, because there is a global lock protecting Python’s entire object space. However, it is possible to temporarily release this lock using the macro Py_BEGIN_ALLOW_THREADS, and to re-acquire it using Py_END_ALLOW_THREADS. This is common around blocking I/O calls, to let other threads use the processor while waiting for the I/O to complete. Obviously, the following function has the same problem as the previous one:

void bug(PyObject list) { PyObject *item = PyList_GetItem(list, 0); Py_BEGIN_ALLOW_THREADS …some blocking I/O call… Py_END_ALLOW_THREADS PyObject_Print(item, stdout, 0); / BUG! */ } 1.10.4. NULL Pointers In general, functions that take object references as arguments do not expect you to pass them NULL pointers, and will dump core (or cause later core dumps) if you do so. Functions that return object references generally return NULL only to indicate that an exception occurred. The reason for not testing for NULL arguments is that functions often pass the objects they receive on to other function — if each function were to test for NULL, there would be a lot of redundant tests and the code would run more slowly.

It is better to test for NULL only at the “source:” when a pointer that may be NULL is received, for example, from malloc() or from a function that may raise an exception.

The macros Py_INCREF() and Py_DECREF() do not check for NULL pointers — however, their variants Py_XINCREF() and Py_XDECREF() do.

The macros for checking for a particular object type (Pytype_Check()) don’t check for NULL pointers — again, there is much code that calls several of these in a row to test an object against various different expected types, and this would generate redundant tests. There are no variants with NULL checking.

The C function calling mechanism guarantees that the argument list passed to C functions (args in the examples) is never NULL — in fact it guarantees that it is always a tuple [4].

It is a severe error to ever let a NULL pointer “escape” to the Python user.

1.11. Writing Extensions in C++ It is possible to write extension modules in C++. Some restrictions apply. If the main program (the Python interpreter) is compiled and linked by the C compiler, global or static objects with constructors cannot be used. This is not a problem if the main program is linked by the C++ compiler. Functions that will be called by the Python interpreter (in particular, module initialization functions) have to be declared using extern “C”. It is unnecessary to enclose the Python header files in extern “C” {…} — they use this form already if the symbol __cplusplus is defined (all recent C++ compilers define this symbol).

1.12. Providing a C API for an Extension Module Many extension modules just provide new functions and types to be used from Python, but sometimes the code in an extension module can be useful for other extension modules. For example, an extension module could implement a type “collection” which works like lists without order. Just like the standard Python list type has a C API which permits extension modules to create and manipulate lists, this new collection type should have a set of C functions for direct manipulation from other extension modules.

At first sight this seems easy: just write the functions (without declaring them static, of course), provide an appropriate header file, and document the C API. And in fact this would work if all extension modules were always linked statically with the Python interpreter. When modules are used as shared libraries, however, the symbols defined in one module may not be visible to another module. The details of visibility depend on the operating system; some systems use one global namespace for the Python interpreter and all extension modules (Windows, for example), whereas others require an explicit list of imported symbols at module link time (AIX is one example), or offer a choice of different strategies (most Unices). And even if symbols are globally visible, the module whose functions one wishes to call might not have been loaded yet!

Portability therefore requires not to make any assumptions about symbol visibility. This means that all symbols in extension modules should be declared static, except for the module’s initialization function, in order to avoid name clashes with other extension modules (as discussed in section The Module’s Method Table and Initialization Function). And it means that symbols that should be accessible from other extension modules must be exported in a different way.

Python provides a special mechanism to pass C-level information (pointers) from one extension module to another one: Capsules. A Capsule is a Python data type which stores a pointer (void *). Capsules can only be created and accessed via their C API, but they can be passed around like any other Python object. In particular, they can be assigned to a name in an extension module’s namespace. Other extension modules can then import this module, retrieve the value of this name, and then retrieve the pointer from the Capsule.

There are many ways in which Capsules can be used to export the C API of an extension module. Each function could get its own Capsule, or all C API pointers could be stored in an array whose address is published in a Capsule. And the various tasks of storing and retrieving the pointers can be distributed in different ways between the module providing the code and the client modules.

Whichever method you choose, it’s important to name your Capsules properly. The function PyCapsule_New() takes a name parameter (const char *); you’re permitted to pass in a NULL name, but we strongly encourage you to specify a name. Properly named Capsules provide a degree of runtime type-safety; there is no feasible way to tell one unnamed Capsule from another.

In particular, Capsules used to expose C APIs should be given a name following this convention:

modulename.attributename The convenience function PyCapsule_Import() makes it easy to load a C API provided via a Capsule, but only if the Capsule’s name matches this convention. This behavior gives C API users a high degree of certainty that the Capsule they load contains the correct C API.

The following example demonstrates an approach that puts most of the burden on the writer of the exporting module, which is appropriate for commonly used library modules. It stores all C API pointers (just one in the example!) in an array of void pointers which becomes the value of a Capsule. The header file corresponding to the module provides a macro that takes care of importing the module and retrieving its C API pointers; client modules only have to call this macro before accessing the C API.

The exporting module is a modification of the spam module from section A Simple Example. The function spam.system() does not call the C library function system() directly, but a function PySpam_System(), which would of course do something more complicated in reality (such as adding “spam” to every command). This function PySpam_System() is also exported to other extension modules.

The function PySpam_System() is a plain C function, declared static like everything else:

static int PySpam_System(const char *command) { return system(command); } The function spam_system() is modified in a trivial way:

static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts;

if (!PyArg_ParseTuple(args, "s", &command))
    return NULL;
sts = PySpam_System(command);
return PyLong_FromLong(sts); } In the beginning of the module, right after the line

#include “Python.h” two more lines must be added:

#define SPAM_MODULE #include “spammodule.h” The #define is used to tell the header file that it is being included in the exporting module, not a client module. Finally, the module’s initialization function must take care of initializing the C API pointer array:

PyMODINIT_FUNC PyInit_spam(void) { PyObject *m; static void *PySpam_API[PySpam_API_pointers]; PyObject *c_api_object;

m = PyModule_Create(&spammodule);
if (m == NULL)
    return NULL;

/* Initialize the C API pointer array */
PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;

/* Create a Capsule containing the API pointer array's address */
c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL);

if (c_api_object != NULL)
    PyModule_AddObject(m, "_C_API", c_api_object);
return m; } Note that PySpam_API is declared static; otherwise the pointer array would disappear when PyInit_spam() terminates!

The bulk of the work is in the header file spammodule.h, which looks like this:

#ifndef Py_SPAMMODULE_H #define Py_SPAMMODULE_H #ifdef __cplusplus extern “C” { #endif

/* Header file for spammodule */

/* C API functions */ #define PySpam_System_NUM 0 #define PySpam_System_RETURN int #define PySpam_System_PROTO (const char *command)

/* Total number of C API pointers */ #define PySpam_API_pointers 1

#ifdef SPAM_MODULE /* This section is used when compiling spammodule.c */

static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;

#else /* This section is used in modules that use spammodule’s API */

static void **PySpam_API;

#define PySpam_System \ ((PySpam_System_RETURN ()PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])

/* Return -1 on error, 0 on success.

  • PyCapsule_Import will set an exception if there’s an error. */ static int import_spam(void) { PySpam_API = (void **)PyCapsule_Import(“spam._C_API”, 0); return (PySpam_API != NULL) ? 0 : -1; }

#endif

#ifdef __cplusplus } #endif

#endif /* !defined(Py_SPAMMODULE_H) */ All that a client module must do in order to have access to the function PySpam_System() is to call the function (or rather macro) import_spam() in its initialization function:

PyMODINIT_FUNC PyInit_client(void) { PyObject *m;

m = PyModule_Create(&clientmodule);
if (m == NULL)
    return NULL;
if (import_spam() < 0)
    return NULL;
/* additional initialization can happen here */
return m; } The main disadvantage of this approach is that the file spammodule.h is rather complicated. However, the basic structure is the same for each function that is exported, so it has to be learned only once.

Finally it should be mentioned that Capsules offer additional functionality, which is especially useful for memory allocation and deallocation of the pointer stored in a Capsule. The details are described in the Python/C API Reference Manual in the section Capsules and in the implementation of Capsules (files Include/pycapsule.h and Objects/pycapsule.c in the Python source code distribution).

Footnotes

[1] An interface for this function already exists in the standard module os — it was chosen as a simple and straightforward example. [2] The metaphor of “borrowing” a reference is not completely correct: the owner still has a copy of the reference. [3] Checking that the reference count is at least 1 does not work — the reference count itself could be in freed memory and may thus be reused for another object! [4] These guarantees don’t hold when you use the “old” style calling convention — this is still found in much existing code.

Dec 3, 2018 - git命令行工具使用简明手册v0.2

(一)介绍篇

  你如果没用过git,那你肯定不好意思说自己是一个软件开发人员。git自诞生迅速成为最流行的分布式版本控制系统。2008年GitHub网站的上线更是让git火得一塌糊涂!
  作为一个命令行爱好者,本文档介绍了基于命令行的git使用(主要在winodws环境下)。这篇文档是git的入门教程,主要参考了![Git教程](https://www.liaoxuefeng.com/wiki/0013739516305929606dd18361248578c67b8067c8c017b000) 如果想进一步了解git,推荐网上的《Git权威指南》,指南本身也是开源的,可以在https://github.com/jiangxin/docker-gotgit 找到。本教程也放在github上,可以在个人网站找到,由于个人水平有限,难免有疏忽和错误之处欢迎批评指正。

git是什么

  git是一个开源的分布式版本控制系统,用于敏捷高效地处理任何或小或大的项目。最初git 是 Linus Torvalds 为了帮助管理 Linux 内核开发而开发的一个开放源码的版本控制软件。git与常用的版本控制工具 CVS, Subversion 等不同,它采用了分布式版本库的方式,这意味着它并不依赖于中心服务器来保存你文件的旧版本。任何一台机器都可以有一个本地版本的控制系统,其实就是一个硬盘上的文件,我们称之为仓库(repository)。如果是多人协作的话,你还需要一个线上仓库,用来同步代码等信息,支持多人协作,比如著名的GitHub等网站。当然,git是一个开源系统,你也可以自己搭建一个git服务器,用于公司内部的软件版本控制。

安装客户端

  • Linux
    大部分linux发行版应该已经包含了git客户端,如果你在命令行下执行git成功,那么说明你的机器上已经安装了git。如果没有安装,可以用发行版的软件包管理工具安装,如基于debain的发行版可以用apt-get install git安装。
  • Windows
    最初git是基于linux的命令行方式,和windows的shell有较大差异,最初在windows下模拟linux命令行的软件还有问题。不过随着软件的成熟,gitscm的windows下git客户端已经非常好用,可以到https://gitforwindows.org/下载安装,我们只要执行Git Bash程序就可以进入命令行方式。要注意在Git Bash里,本地硬盘,如C:\myfile文件,在gitscm命令行下写成/C/myfile。 基本配置及目录
    git config --global user.name "My Name" 
    git config --global user.email myEmail@example.com
    

    安装好客户端之后,首先执行这两个命令来设置使用者名字和邮件,这代表了你在提交时候的身份。其实config命令修改了git的配置文件,Git相关的配置文件有三个

  • windows在gitscm安装目录下gitconfig ,linux在/etc/gitconfig:包含了适用于系统所有用户和所有项目的值。
  • windows在用户目录下.gitconfig ,linux在~/.gitconfig:只适用于当前登录用户的配置。
  • 位于git项目目录中的.git/config:适用于特定git项目的配置。 对于同一配置项,三个配置文件的优先级是1<2<3
  1. git init
      进入某个空的文件夹下,打开Git Bash命令窗口输入git init。命令主要用来初始化一个空的git本地仓库。执行完上面的命令,当前目录下会自动生成.git隐藏文件夹,该隐藏文件夹就是git版本库,下面讨论的所有对象都存储在这个目录里。初始化完成之后,你就可以在这个文件夹里创建文件进行管理了。

(二)基础篇

  上一篇介绍了git的安装,基本配置命令和初始化一个本地库的命令。下面继续学习git的基本概念和操作。刚刚学习GIT时,似乎让人很迷惑,以至于误解,误用。百度学习了半天也不得要领。但是事实上不应该如此难以理解,只要你理解到这个命令究竟在干什么。首先我们来看几个术语:

  • HEAD
    当前活跃分支的游标,也就是在当前分支你最近的一个提交,可以用 checkout 命令改变 HEAD 指向的位置。形象的记忆就是:你现在在哪儿,HEAD 就指向哪儿,所以 Git 才知道你在那儿!HEAD是git内置的定义好的特定含义功能,不可以修改。master,origin都是常用的公共命名方式,可以有自己的定义
  • master
    首次创建仓库时默认分支的名字,在大多数情况下,master是指主干分支。那这个master到底是什么呢?其实它在.git目录下对应了一个引用文.git/refs/heads/master文件,而该文件的内容便是该分支中最新的一次提交的ID *Origin
    默认的远程仓库的名字。
  • Index
    index也被称为staging area或暂存区,是指一整套即将被下一个提交的文件集合,也可以理解是当前文件集的一个快照。
  • Working Copy
    working copy代表你正在工作的那个文件集   另外在初始状态下,文件默认是不被git管理的,我们称之为未跟踪状态。Git下文件有三种状态,你的文件可能处于其中之一:已提交(committed)、已修改(modified)和已暂存(staged)。 已提交表示数据已经安全的保存在本地数据库中。 已修改表示修改了文件,但还没保存到数据库中。 已暂存表示对一个已修改文件的当前版本做了标记,使之包含在下次提交的快照中,如下图:

最基本的Git命令

  1. git add 文件名
      如果文件没被跟踪,则文件加入工作目录,如果已经在工作目录并且修改过,则加入暂存。git add . 命令会将所有文件加入工作目录或暂存区。
  2. git status
      执行命令后会显示一个报告,如:
      这份报告应该这样解读:Changes to bo committed就是如果要提交到本地库,需要做什么;Changes not staged for commit就是如果要加入暂存库,需要做什么。另外还贴心的提供了一些帮助。这样就比较容易理解显示的内容了。同时嫌太啰嗦的话可以执行:git status –s,文件前面的第一列对应Changes to bo committed,第二列对应Changes not staged for commit,M-修改 D-删除 A-第一次加入(本地库还没有),对应上图结果:
  3. git commit
      暂存区域提交到本地库。会启动文本编辑器以便输入本次提交的说明。 (默认会启用 shell 的环境变量 $EDITOR 所指定的软件,一般都是 vim 或 emacs。当然也可以按照 起步 介绍的方式,使用 git config –global core.editor 命令设定你喜欢的编辑软件。)   如果不想打开编辑器直接提交可以用命令git commit –m “本次提交的说明”,也可以git commit –a 将修改的文件直接从工作目录提交的本地库(联同暂存库的修改)。为了便于理解,描述一下简单工作流程三部曲:
    • 当你第一次checkout(签出)一个分支,HEAD就指向当前分支的最近一个commit。在working copy的文件集和HEAD,INDEX中的文件集是完全相同的。所有三者都是相同的状态,GIT很happy。此时执行git status没有任何报告。
    • 当你对一个文件修改,Git感知到了这个修改,并且说:“嘿,文件已经变更了!你的working copy不再和INDEX区, HEAD相同!”,随后GIT标记这个文件是修改过的。此时执行git status会显示INDEX区需要修改。
    • 然后,当你执行一个git add,它就将文件保存的暂存区,并且说:“嘿,OK,现在你的working copy和INDEX区是相同的,但是他们和HEAD区是不同的!” 此时执行git status会显示本地库需要修改   当你执行一个git commit,GIT就创建一个新的commit,随后HEAD就指向这个新的commit,而index,working copy的状态和HEAD就又完全匹配相同了,GIT又一次HAPPY了。
  4. git diff 文件名
      执行git diff会报告当前文件和INDEX区之间的差异,如下图:
      什么?看不懂报告?可以学习一下diff命令,报告的格式是一样的。说实话我也看不懂,太晦涩了。其它用法如下:
    • git diff –cached [<path>...] 比较暂存区与最新本地版本库(本地库中最近一次commit的内容)
    • git diff HEAD [<path>...] 比较工作区与最新本地版本库   还有……,可以百度一下。

如何进行回退

  既然是版本管理,当然可以回退。在你每次提交的时候,相当于所有的文件保存了一个快照,版本管理上叫做基线。一般可以根据某个基线进行版本发布,当然,如果发现某个提交的版本有问题,可以回退到这个版本查找问题。

  1. git log
      进行多次提交之后,可以通过本命令查看提交历史。请注意上面的一窜十六进制数字。因为git是一个分布式系统,该串代号代表了一个提交,而且是全球唯一的数字,这样在以后提交到远程服务器时不会有冲突。在下面的回退命令中,可以输入该数字(前几位),就可以代表回退到该版本。
  2. git checkout –– 文件(场景一)
      撤销本地改动代码,就是把文件在工作区的修改全部撤销,如果修改后还没有被放到暂存区,现在,撤销修改就回到和版本库一模一样的状态;如果添加到暂存区后,又作了修改,现在,撤销修改就回到添加到暂存区后的状态。
  3. git reset HEAD文件(场景二)
      可以把暂存区的修改撤销掉(unstage),重新放回工作区。用HEAD时,表示最新的版本。当你不但改乱了工作区某个文件的内容,还添加到了暂存区时,想丢弃修改,分两步,第一步用命令git reset HEAD <file>,就回到了场景1,第二步按场景1操作。
  4. git reset HEAD^ (场景三)
      已经提交了不合适的修改到版本库时,想要撤销本次提交,只能采用版本回退,不过前提是没有推送到远程库。命令中上一个版本就是HEAD^,上上一个版本就是HEAD^^,当然往上100个版本写100个^比较容易数不过来,所以写成HEAD~100,版本号也可以直接写十六进制的编号。 Reset命令三个参数的含义:
    • git reset –mixed:此为默认方式,不带任何参数的git reset,即时这种方式,它回退到某个版本,只保留源码,回退commit和index信息
    • git reset –soft:回退到某个版本,只回退了commit的信息,不会恢复到index file一级。如果还要提交,直接commit即可
    • git reset –hard:彻底回退到某个版本,本地的源码也会变为上一个版本的内容,此命令(从本地库回复),所有可能会丢失部分工作,请慎用!   总结一下上面讲到的提交和回退命令:

操作远程仓库

  基础篇可以让我们方便的在本机管理自己的文件。但git更强大的功能在于多人协作,所有必须可以让本地的仓库和远程仓库进行上传、下载、合并等,这样别人也可以看到你的代码并进行修改。

  1. git remote –v
      查看远程仓库,看起来像这样:
    origin https://github.com/schacon/ticgit (fetch)
    origin https://github.com/schacon/ticgit (push)
      显而易见前面是简称,后面是权限。
  2. git remote add <shortname> <url>
      添加一个新的远程 Git 仓库,同时指定一个你可以轻松引用的简写:
  3. git fetch <url>
      git remote add pb https://github.com/paulboone/ticgit,之后你想拉取 Paul 的仓库中有但你没有的信息,可以运行 git fetch pb。现在 Paul 的 master 分支可以在本地通过 pb/master 访问到。这个命令会访问远程仓库,从中拉取所有你还没有的数据。 执行完成后,你将会拥有那个远程仓库中所有分支的引用,可以随时合并或查看。
  4. git clone <url>
      clone 命令克隆了一个仓库,命令会自动将其添加为远程仓库并默认以 “origin” 为简写。 所以,git fetch origin 会抓取克隆(或上一次抓取)后新推送的所有工作。 必须注意 git fetch 命令会将数据拉取到你的本地仓库 - 它并不会自动合并或修改你当前的工作。 当准备好时你必须手动将其合并入你的工作。
  5. git pull
      命令来自动的抓取然后合并远程分支到当前分支。 这对你来说可能是一个更简单或更舒服的工作流程;默认情况下,git clone 命令会自动设置本地 master 分支跟踪克隆的远程仓库的 master 分支(或不管是什么名字的默认分支)。 运行 git pull 通常会从最初克隆的服务器上抓取数据并自动尝试合并到当前所在的分支。
  6. git push [remote-name] [branch-name]
      当你想分享你的项目时,必须将其推送到上游。 当你想要将 master 分支推送到 origin 服务器时(再次说明,克隆时通常会自动帮你设置好那两个名字),那么运行这个命令就可以将你所做的备份到服务器:
  7. git push origin master
      只有当你有所克隆服务器的写入权限,并且之前没有人推送过时,这条命令才能生效。 当你和其他人在同一时间克隆,他们先推送到上游然后你再推送到上游,你的推送就会毫无疑问地被拒绝。 你必须先将他们的工作拉取下来并将其合并进你的工作后才能推送。
  8. git remote show [remote-name] / git remote show origin
      remote show origin
     Fetch URL: https://github.com/schacon/ticgit
      Push  URL: https://github.com/schacon/ticgit
      HEAD branch: master
      Remote branches:
     master                               tracked
     dev-branch                           tracked
      Local branch configured for 'git pull':
     master merges with remote master
      Local ref configured for 'git push':
     master pushes to master (up to date)
    

      它同样会列出远程仓库的 URL 与跟踪分支的信息。 这些信息非常有用,它告诉你正处于 master 分支,并且如果运行 git pull,就会抓取所有的远程引用,然后将远程 master 分支合并到本地 master 分支。 它也会列出拉取到的所有远程引用。

  9. git remote rename
      如果想要重命名引用的名字可以运行 git remote rename 去修改一个远程仓库的简写名。 例如,想要将 pb 重命名为 paul,可以用 git remote rename pb paul这样做。值得注意的是这同样也会修改你的远程分支名字。 那些过去引用 pb/master 的现在会引用 paul/master。
  10. git remote rm
      如果因为一些原因想要移除一个远程仓库,如你已经从服务器上搬走了或不再想使用某一个特定的镜像了,又或者某一个贡献者不再贡献了 - 可以使用git remote rm paul。 最后总结一下git各种操作,如下图。

(三)提高篇

  git的强大体现在分布式的架构和高效率的分支功能,支持从简单到复杂的项目。提高篇介绍的功能可以应付复杂项目和多人协作,学会了这些功能,复杂项目的代码管理你也能轻松驾驭。

分支管理

分支就是你从开发主线上创建一个分支,然后在不影响主线的同时继续提交代码。到开发完毕后,再一次性合并到原来的A主线分支上,这样既安全,又不影响别人工作。

  1. git checkout -b dev
    创建分支dev,命令加上-b参数表示创建并切换,相当于以下两条命令
    $ git branch dev
    $ git checkout dev
    

    此时状态如图:

  2. git branch 查看分支
    此时对当前工作区文件进行修改并提交。 此时状态如图:

    现在,dev分支的工作完成,我们就可以切换回master分支:
    git checkout master
    

    此时状态如图:

  3. git merge 合并分支
    现在,我们把dev分支的工作成果合并到master分支上
    $ git merge dev
    Updating d46f35e..b17d20e
    Fast-forward
     readme.txt | 1 +
     1 file changed, 1 insertion(+)
    

    此时状态如图:

  4. git branch -d 删除分支
    $ git branch -d dev
    Deleted branch dev (was b17d20e).
    

    这时可以放心的删除dev分支了。但假如分支还没有被合并,如果删除,将丢失掉修改(此时git会报错),如果要强行删除,需要使用大写的-D参数:

    $ git branch -D dev
    Deleted branch dev (was 287773e).
    
  5. 分支策略
    通常,合并分支时,如果可能,Git会用Fast forward模式,但这种模式下,删除分支后,会丢掉分支信息。如果要强制禁用Fast forward模式,Git就会在merge时生成一个新的commit,这样,从分支历史上就可以看出分支信息。
    $ git merge --no-ff -m "merge with no-ff" dev
    Merge made by the 'recursive' strategy.
     readme.txt | 1 +
     1 file changed, 1 insertion(+)
    

    此时状态如图:

解决冲突

  当不同的分支对同一个文件进行了修改,这种情况下,Git无法执行“快速合并”,只能试图把各自的修改合并起来,但这种合并就可能会有冲突,就必须首先解决冲突,再提交完成合并。   比如feature1和master分支的readme.txt有冲突,当合并到master分支时:

$ git merge feature1
Auto-merging readme.txt
CONFLICT (content): Merge conflict in readme.txt
Automatic merge failed; fix conflicts and then commit the result.

可以用git status显示合并的结果:

$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
  (use "git push" to publish your local commits)

You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Unmerged paths:
  (use "git add <file>..." to mark resolution)

    both modified:   readme.txt

no changes added to commit (use "git add" and/or "git commit -a")

打开readme.txt文件,发现Git用<<<<<<<,=======,>>>>>>>标记出不同分支的内容。我们修改如下后保存之后再次提交

$ git add readme.txt 
$ git commit -m "conflict fixed"
[master cf810e4] conflict fixed

此时状态如图:

用带参数的git log也可以看到分支的合并情况,最后,删除feature1分支。

$ git branch -d feature1
Deleted branch feature1 (was 14096d0).

小结:当Git无法自动合并分支时,就必须首先解决冲突。解决冲突后,再提交,合并完成。解决冲突就是把Git合并失败的文件手动编辑为我们希望的内容,再提交。
用git log –graph命令可以看到分支合并图。

stash

  当你修复一个bug的。很自然地,你想创建一个分支来修复它。但是,当前正在dev上进行的工作还没有提交,这时怎么办?幸好,Git还提供了一个stash功能,可以把当前工作现场“储藏”起来,等以后恢复现场后继续工作:

$ git stash
Saved working directory and index state WIP on dev: f52c633 add merge

  现在,用git status查看工作区,就是干净的(除非有没有被Git管理的文件),因此可以放心地创建分支来修复bug。首先确定要在哪个分支上修复bug,假定需要在master分支上修复,就从master创建临时分支。修复完成后,切换到master分支,并完成合并,最后删除临时分支。   切换回dev分支,执行git status,发现工作区是干净的,刚才的工作现场存到哪去了?用git stash list命令看看:

$ git stash list
stash@{0}: WIP on dev: f52c633 add merge

  工作现场还在,Git把stash内容存在某个地方了,但是需要恢复一下,有两个办法:

  • git stash apply恢复,但是恢复后,stash内容并不删除,你需要用git stash drop来删除;
  • git stash pop,恢复的同时把stash内容也删了
      你可以多次stash,恢复的时候,先用git stash list查看,然后恢复指定的stash,用命令:
    git stash apply stash@{0}
    

多人协作

  1. 推送分支 推送分支,就是把该分支上的所有本地提交推送到远程库。推送时,要指定本地分支,这样,Git就会把该分支推送到远程库对应的远程分支上:
    $ git push origin dev
    
  2. 抓取分支 多人协作时,大家都会往master和dev分支上推送各自的修改。当你的小伙伴从远程库clone时,默认情况下,你的小伙伴只能看到本地的master分支。现在,你的小伙伴要在dev分支上开发,就必须创建远程origin的dev分支到本地,于是他用这个命令创建本地dev分支:
    $ git checkout -b dev origin/dev
    

    现在,他就可以在dev上继续修改,然后,时不时地把dev分支push到远程:

    $ git push origin dev
    

      当然,此时也可能会引起冲突,一样也需要解决冲突之后再重新push,因此,多人协作的工作模式通常是这样:

    1. 首先,可以试图用git push origin <branch-name>推送自己的修改;
    2. 如果推送失败,则因为远程分支比你的本地更新,需要先用git pull试图合并;
    3. 如果合并有冲突,则解决冲突,并在本地提交;
    4. 没有冲突或者解决掉冲突后,再用git push origin <branch-name>推送就能成功!
    5. 如果git pull提示no tracking information,则说明本地分支和远程分支的链接关系没有创建,用命令git branch --set-upstream-to <branch-name> origin/<branch-name>

Rebase

  git rebase用于把一个分支的修改合并到当前分支,合理使用rebase命令可以使我们的提交历史干净、简洁!假设你现在基于远程分支”origin”,创建一个叫”mywork”的分支。

$ git checkout -b mywork origin

假设远程分支”origin”已经有了2个提交,如图

现在我们在这个分支做一些修改,然后生成两个提交(commit),但是与此同时,有些人也在”origin”分支上做了一些修改并且做了提交了. 这就意味着”origin”和”mywork”这两个分支各自”前进”了,它们之间”分叉”了,如图:

在这里,你可以用”pull”命令把”origin”分支上的修改拉下来并且和你的修改合并; 结果看起来就像一个新的”合并的提交”(merge commit):
但是,如果你想让”mywork”分支历史看起来像没有经过任何合并一样,你也许可以用 git rebase:

$ git checkout mywork
$ git rebase origin

  这些命令会把你的”mywork”分支里的每个提交(commit)取消掉,并且把它们临时 保存为补丁(patch)(这些补丁放到”.git/rebase”目录中),然后把”mywork”分支更新 为最新的”origin”分支,最后把保存的这些补丁应用到”mywork”分支上。

  在rebase的过程中,也许会出现冲突(conflict). 在这种情况,Git会停止rebase并会让你去解决 冲突;在解决完冲突后,用”git-add”命令去更新这些内容的索引(index), 然后,你无需执行 git-commit,只要执行:

$ git rebase --continue

  这样git会继续应用(apply)余下的补丁。在任何时候,你可以用–abort参数来终止rebase的行动,并且”mywork” 分支会回到rebase开始前的状态。

$ git rebase --abort

  rebase操作可以把本地未push的分叉提交历史整理成直线,同时rebase的目的是使得我们在查看历史提交的变化时更容易,因为分叉的提交需要三方对比。

分支管理策略

  实际开发中,我们应该按照几个基本原则进行分支管理。

  1. 首先,master分支应该是非常稳定的,也就是仅用来发布新版本,平时不能在上面干活;那在哪干活呢?干活都在dev分支上。到某个时候,比如1.0版本发布时,再把dev分支合并到master上,在master分支发布1.0版本;
  2. 你和你的小伙伴们每个人都在dev分支上干活,每个人都有自己的分支,时不时地往dev分支上合并就可以了。
  3. 添加一个新功能时,每添加一个新功能,最好新建一个feature分支,在上面开发,完成后,合并,最后,删除该feature分支。
  4. 并不是一定要把本地分支往远程推送,那么,哪些分支需要推送,哪些不需要呢?
    • master分支是主分支,因此要时刻与远程同步;
    • dev分支是开发分支,团队所有成员都需要在上面工作,所以也需要与远程同步;
    • bug分支只用于在本地修复bug,就没必要推到远程了,除非老板要看看你每周到底修复了几个bug;
    • feature分支是否推到远程,取决于你是否和你的小伙伴合作在上面开发。

使用标签

  发布一个版本时,我们通常先在版本库中打一个标签(tag),这样就唯一确定了打标签时刻的版本。将来无论什么时候,取某个标签的版本,就是把那个打标签的时刻的历史版本取出来。 所以,标签也是版本库的一个快照。Git 的标签虽然是版本库的快照,但其实它就是指向某个 commit 的指针(跟分支很像对不对?但是分支可以移动,标签不能移动)。
  Git有commit,为什么还要引入tag?比如有一个commit号是6a5819e…这样的一窜数字,不好记忆和交流,所以,tag就是一个让人容易记住的有意义的名字,它跟某个commit绑在一起。
在Git中打标签非常简单,首先,切换到需要打标签的分支上:

$ git branch
* dev
  master
$ git checkout master
Switched to branch 'master'

  然后,敲命令git tag 就可以打一个新标签:

$ git tag v1.0

  可以用命令git tag查看所有标签:

$ git tag
v1.0

  可以找到历史提交的commit id,打标签。比方说要对应的commit id是f52c633,敲入命令:

$ git tag v0.9 f52c633

  再用命令git tag查看标签:

$ git tag
v0.9
v1.0

  注意,标签不是按时间顺序列出,而是按字母排序的。可以用git show <tagname>查看标签信息。还可以创建带有说明的标签,用-a指定标签名,-m指定说明文字:

$ git tag -a v0.1 -m "version 0.1 released" 1094adb

  如果标签打错了,也可以删除:

$ git tag -d v0.1
Deleted tag 'v0.1' (was f15b0dd)

  如果要推送某个标签到远程,使用命令git push origin <tagname>

$ git push origin v1.0
Total 0 (delta 0), reused 0 (delta 0)
To github.com:michaelliao/learngit.git
 * [new tag]         v1.0 -> v1.0

  或者,一次性推送全部尚未推送到远程的本地标签:

$ git push origin --tags
Total 0 (delta 0), reused 0 (delta 0)
To github.com:michaelliao/learngit.git
 * [new tag]         v0.9 -> v0.9

  如果标签已经推送到远程,要删除远程标签就麻烦一点,先从本地删除:

$ git tag -d v0.9
Deleted tag 'v0.9' (was f52c633)

  然后,从远程删除。删除命令也是push,但是格式如下:

$ git push origin :refs/tags/v0.9
To github.com:michaelliao/learngit.git
 - [deleted]         v0.9

其它技巧

  1. 忽略特殊文件
    在Git工作区的根目录下创建一个特殊的.gitignore文件,然后把要忽略的文件名填进去,Git就会自动忽略这些文件。不需要从头写.gitignore文件,GitHub已经为我们准备了各种配置文件,只需要组合一下就可以使用了。所有配置文件可以直接在线浏览:https://github.com/github/gitignore
  2. 有没有经常敲错命令?比如git status?status这个单词真心不好记。如果敲git st就表示git status那就简单多了,当然这种偷懒的办法我们是极力赞成的。我们只需要敲一行命令,告诉Git,以后st就表示status:
    $ git config --global alias.st status
    

    当然还有别的命令可以简写,很多人都用co表示checkout,ci表示commit,br表示branch:

  3. 修改缺省编辑器
    再有些命令git会启动一个vim编辑器,不过vim实在太难用了,我自己习惯用sublime,执行:
    git config –global core.editor ‘C:\\Program Files\\Sublime Text 3\\sublime_text.exe’
    

    这样git会自动启动sublime。你可以修改成自己习惯的编辑器。如果想再命令行中方便使用sublime,回到用户目录(cd命令),打开.bashrc文件,加上一行

    alias sublime="/c/Program\ Files/Sublime\ Text\ 3/sublime_text.exe"
    

    再命令行中收入sublime就可以打开sublime编辑器了。 (完)

  本人的更多原创文章请加入个人微信公众号。

Jul 24, 2018 - Tutorial: Using Motor With Tornado教程:在Tornado中使用Motor(英汉对照)

Contents

Tutorial: Using Motor With Tornado

  • Tutorial Prerequisites
  • Object Hierarchy
  • Creating a Client
  • Getting a Database
  • Tornado Application Startup Sequence
  • Getting a Collection
  • Inserting a Document
  • Getting a Single Document With find_one()
  • Querying for More Than One Document
    • async for
    • Iteration in Python 3.4
  • Counting Documents
  • Updating Documents
  • Removing Documents
  • Commands
  • Further Reading

Tutorial Prerequisites

准备 You can learn about MongoDB with the MongoDB Tutorial before you learn Motor. Install pip and then do: 安装pip并安装tornado和motor:

$ pip install tornado motor

Once done, the following should run in the Python shell without raising an exception: 接着可以执行如下命令:

>>> import motor.motor_tornado

This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so: 本教程假设MongoDB已经在本机缺省端口运行。你可以安装MongoDB并启动。

$ mongod

Object Hierarchy

对象层级:
Motor, like PyMongo, represents data with a 4-level object hierarchy:

  • MotorClient represents a mongod process, or a cluster of them. You explicitly create one of these client objects, connect it to a running mongod or mongods, and use it for the lifetime of your application.
  • MotorDatabase: Each mongod has a set of databases (distinct sets of data files on disk). You can get a reference to a database from a client.
  • MotorCollection: A database has a set of collections, which contain documents; you get a reference to a collection from a database.
  • MotorCursor: Executing find() on a MotorCollection gets a MotorCursor, which represents the set of documents matching a query.
  • MotorClient 代表mongod进程,或者是它们的集群。您显式创建这些客户端对象中的一个,将其连接到运行的mongod,并将其用于应用程序的生命周期。
  • MotorDatabase:每个mongod都有一组数据库(磁盘上的不同数据文件集)。您可以从客户端获得对数据库的引用。
  • MotorCollection:数据库有一组集合,其中包含文档;从数据库中获取对集合的引用。
  • MotorCursor:在一个MotorCollection 上执行find(),得到一个游标,它代表一组匹配查询的文档。

Creating a Client

建立一个客户端
You typically create a single instance of MotorClient at the time your application starts up.
在应用程序启动时,通常会创建一个MotorClient 实例。

>>> client = motor.motor_tornado.MotorClient()

This connects to a mongod listening on the default host and port. You can specify the host and port like: 这连接到一个mongod 监听默认主机和端口。您可以指定主机和端口类似:

>>> client = motor.motor_tornado.MotorClient('localhost', 27017)

Motor also supports connection URIs: 还支持连接URI:

>>> client = motor.motor_tornado.MotorClient('mongodb://localhost:27017')

Connect to a replica set like: 连接到复制集:

>>> client = motor.motor_tornado.MotorClient('mongodb://host1,host2/?replicaSet=my-replicaset-name')

Getting a Database

创建数据库引用 A single instance of MongoDB can support multiple independent databases. From an open client, you can get a reference to a particular database with dot-notation or bracket-notation: MongoDB的一个实例可以支持多个独立的数据库。在一个已经打开的客户端,您可以使用点标记或括号符号来获得对特定数据库的引用一个特定数据库,:

>>> db = client.test_database
>>> db = client['test_database']

Creating a reference to a database does no I/O and does not require an await expression. 创建数据库引用不需要I/O,也不需要await 表达式。

Tornado Application Startup Sequence

Now that we can create a client and get a database, we’re ready to start a Tornado application that uses Motor: 现在我们可以创建一个客户端并获得一个数据库,我们准备启动一个使用Motor的Tornado应用程序:

db = motor.motor_tornado.MotorClient().test_database

application = tornado.web.Application([
    (r'/', MainHandler)
], db=db)

application.listen(8888)
tornado.ioloop.IOLoop.current().start()

There are two things to note in this code. First, the MotorClient constructor doesn’t actually connect to the server; the client will initiate a connection when you attempt the first operation. Second, passing the database as the dbkeyword argument to Application makes it available to request handlers: 在这段代码中有两件事需要注意。首先,MotorClient 构造函数实际上没有连接到服务器;当您尝试第一次操作时,客户端将启动连接。第二,将数据库作为db参数传递给应用程序,使得它可以用于请求处理程序:

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        db = self.settings['db']

It is a common mistake to create a new client object for every request; this comes at a dire performance cost. Create the client when your application starts and reuse that one client for the lifetime of the process, as shown in these examples. The Tornado HTTPServer class’s start() method is a simple way to fork multiple web servers and use all of your machine’s CPUs. However, you must create your MotorClient after forking: 为每个请求创建新的客户端对象是一个常见错误;这是一个可怕的性能代价。在应用程序启动时创建客户端,并在进程的生命周期中重用一个客户端,正如这些示例所示。 Tornado 中HTTPSServer类的start()方法是一个简单的复制Web服务器进程方法来并发使用CPU的,您必须在复制后创建您的MotorClient :

# Create the application before creating a MotorClient.
application = tornado.web.Application([
    (r'/', MainHandler)
])

server = tornado.httpserver.HTTPServer(application)
server.bind(8888)

# Forks one process per CPU.
server.start(0)

# Now, in each child process, create a MotorClient.
application.settings['db'] = MotorClient().test_database
IOLoop.current().start()

For production-ready, multiple-CPU deployments of Tornado there are better methods than HTTPServer.start(). See Tornado’s guide to Running and deploying. 对于生产环境、多CPU部署的Tornado,有比HTTPSServer更好的方法。参考Tornado的运行和部署指南。

Getting a Collection

获得Collection A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in Motor works the same as getting a database: collection 是存储在MongoDB中的一组文档,并且可以被认为大致等同于关系数据库中的表。在Motor中获得一个collection与获取数据库方法类似:

>>> collection = db.test_collection
>>> collection = db['test_collection']

Just like getting a reference to a database, getting a reference to a collection does no I/O and doesn’t require an await expression. 和获取数据库引用类似,创建collection引用不需要I/O,也不需要await 表达式。

Inserting a Document

插入文档(Document) As in PyMongo, Motor represents MongoDB documents with Python dictionaries. To store a document in MongoDB, call insert_one() in an await expression: 和PyMongo一样,Motor用Python字典表示文档。若要在MongoDB中存储文档,请在await 表达式中调用insert_one()。

>>> async def do_insert():
...     document = {'key': 'value'}
...     result = await db.test_collection.insert_one(document)
...     print('result %s' % repr(result.inserted_id))
...
>>>
>>> IOLoop.current().run_sync(do_insert)
result ObjectId('...')

See also

The MongoDB documentation on 参考MongoDB文档 insert A typical beginner’s mistake with Motor is to insert documents in a loop, not waiting for each insert to complete before beginning the next: 一个典型的初学者的错误是在一个循环中插入文档,而不是等待每个插入在下一个开始之前完成:

>>> for i in range(2000):
...     db.test_collection.insert_one({'i': i})

In PyMongo this would insert each document in turn using a single socket, but Motor attempts to run all the insert_one() operations at once. This requires up to max_pool_size open sockets connected to MongoDB, which taxes the client and server. To ensure instead that all inserts run in sequence, use await: 在PyMongo中,这将使用单个套接字依次插入每个文档,但Motor试图同时运行所有的insert_one()操作。这样会使连接到MongoDB的开放套接字到max_pool_size而耗尽客户端和服务器端的连接。为了确保所有插入都按顺序运行,请使用await:

>>> async def do_insert():
...     for i in range(2000):
...         await db.test_collection.insert_one({'i': i})
...
>>> IOLoop.current().run_sync(do_insert)

See also 参考: Bulk Write Operations. See also The MongoDB documentation on 参考: insert
For better performance, insert documents in large batches with insert_many(): 为了获得更好的性能,使用insert_many()插入大批量的文档:

>>> async def do_insert():
...     result = await db.test_collection.insert_many(
...         [{'i': i} for i in range(2000)])
...     print('inserted %d docs' % (len(result.inserted_ids),))
...
>>> IOLoop.current().run_sync(do_insert)
inserted 2000 docs

Getting a Single Document With find_one()

使用find_one()查询一个文档 Use find_one() to get the first document that matches a query. For example, to get a document where the value for key “i” is less than 1: 使用find_one()获取与查询匹配的第一个文档。例如,获取一个文档,其中关键字”i”的值小于1:

>>> async def do_find_one():
...     document = await db.test_collection.find_one({'i': {'$lt': 1}})
...     pprint.pprint(document)
...
>>> IOLoop.current().run_sync(do_find_one)
{'_id': ObjectId('...'), 'i': 0}

The result is a dictionary matching the one that we inserted previously. The returned document contains an “_id”, which was automatically added on insert. (We use pprint here instead of print to ensure the document’s key names are sorted the same in your output as ours.) 结果是一个字典与我们先前插入的字典相匹配。 返回的文档包含一个”_id”,它被自动添加到INSERT中。 (我们在这里使用pprint 代替打印,以确保文档的键名在输出中与我们的相同。) See also

The MongoDB documentation on 参考: find

Querying for More Than One Document

多文档查询 Use find() to query for a set of documents. find() does no I/O and does not require an await expression. It merely creates an MotorCursor instance. The query is actually executed on the server when you call to_list() or execute an async for loop. To find all documents with “i” less than 5: 使用find() 查询一组文档。find() 没有I/O,不需要await 表达式。它只创建一个MotorCursor 游标实例。当您调用l to_list()或为循环执行异步时(async for loop),查询实际上在服务器上执行。 查询”i”小于5所有文档:

>>> async def do_find():
...     cursor = db.test_collection.find({'i': {'$lt': 5}}).sort('i')
...     for document in await cursor.to_list(length=100):
...         pprint.pprint(document)
...
>>> IOLoop.current().run_sync(do_find)
{'_id': ObjectId('...'), 'i': 0}
{'_id': ObjectId('...'), 'i': 1}
{'_id': ObjectId('...'), 'i': 2}
{'_id': ObjectId('...'), 'i': 3}
{'_id': ObjectId('...'), 'i': 4}

A length argument is required when you call to_list to prevent Motor from buffering an unlimited number of documents. length 参数可以在调用to_list 返回无限数量的文档时保护Motor缓冲。

  • async for
    You can handle one document at a time in an async for loop: 您可以在async for循环中处理一个文档 :
    >>> async def do_find():
    ...     c = db.test_collection
    ...     async for document in c.find({'i': {'$lt': 2}}):
    ...         pprint.pprint(document)
    ...
    >>> IOLoop.current().run_sync(do_find)
    {'_id': ObjectId('...'), 'i': 0}
    {'_id': ObjectId('...'), 'i': 1}
    

    You can apply a sort, limit, or skip to a query before you begin iterating: 在开始迭代之前,可以对查询应用排序、限制或跳过:

    >>> async def do_find():
    ...     cursor = db.test_collection.find({'i': {'$lt': 4}})
    ...     # Modify the query before iterating
    ...     cursor.sort('i', -1).skip(1).limit(2)
    ...     async for document in cursor:
    ...         pprint.pprint(document)
    ...
    >>> IOLoop.current().run_sync(do_find)
    {'_id': ObjectId('...'), 'i': 2}
    {'_id': ObjectId('...'), 'i': 1}
    

    The cursor does not actually retrieve each document from the server individually; it gets documents efficiently in large batches. 游标实际上没有从服务器中单独检索每个文档;它在批量查询中有效地获取文档。

  • Iteration in Python 3.4
    In Python versions without async for, handle one document at a time with fetch_next and next_object(): 在没有ASYNC的Python版本中,一次处理一个文档,使用fetch_next和next_object():
    >>> @gen.coroutine
    ... def do_find():
    ...     cursor = db.test_collection.find({'i': {'$lt': 5}})
    ...     while (yield cursor.fetch_next):
    ...         document = cursor.next_object()
    ...         pprint.pprint(document)
    ...
    >>> IOLoop.current().run_sync(do_find)
    {'_id': ObjectId('...'), 'i': 0}
    {'_id': ObjectId('...'), 'i': 1}
    {'_id': ObjectId('...'), 'i': 2}
    {'_id': ObjectId('...'), 'i': 3}
    {'_id': ObjectId('...'), 'i': 4}
    

    Counting Documents ==== 文档计数 Use count_documents() to determine the number of documents in a collection, or the number of documents that match a query: 使用count_documents()来确定集合中的文档数量,或者确定与查询匹配的文档数量:

    >>> async def do_count():
    ...     n = await db.test_collection.count_documents({})
    ...     print('%s documents in collection' % n)
    ...     n = await db.test_collection.count_documents({'i': {'$gt': 1000}})
    ...     print('%s documents where i > 1000' % n)
    ...
    >>> IOLoop.current().run_sync(do_count)
    2000 documents in collection
    999 documents where i > 1000
    

    Updating Documents ==== 更改文件 replace_one() changes a document. It requires two parameters: a query that specifies which document to replace, and a replacement document. The query follows the same syntax as for find() or find_one(). To replace a document: replace_one()更改文档。它需要两个参数:一个指定要替换哪个文档的查询,以及一个替换文档。查询遵循与 find()或 find_one()相同的语法。替换一个文档:

    >>> async def do_replace():
    ...     coll = db.test_collection
    ...     old_document = await coll.find_one({'i': 50})
    ...     print('found document: %s' % pprint.pformat(old_document))
    ...     _id = old_document['_id']
    ...     result = await coll.replace_one({'_id': _id}, {'key': 'value'})
    ...     print('replaced %s document' % result.modified_count)
    ...     new_document = await coll.find_one({'_id': _id})
    ...     print('document is now %s' % pprint.pformat(new_document))
    ...
    >>> IOLoop.current().run_sync(do_replace)
    found document: {'_id': ObjectId('...'), 'i': 50}
    replaced 1 document
    document is now {'_id': ObjectId('...'), 'key': 'value'}
    

    You can see that replace_one() replaced everything in the old document except its _id with the new document. Use update_one() with MongoDB’s modifier operators to update part of a document and leave the rest intact. We’ll find the document whose “i” is 51 and use the $set operator to set “key” to “value”: 可以看到,replace_one()替换了旧文档中的所有内容,除了它的的_ID。 使用update_one()使用MongoDB的修饰操作符来更新文档的一部分,并将其余部分保留完整。我们将找到其”i”为51的文档,并使用$set操作符将”key”设置为”value”:

    >>> async def do_update():
    ...     coll = db.test_collection
    ...     result = await coll.update_one({'i': 51}, {'$set': {'key': 'value'}})
    ...     print('updated %s document' % result.modified_count)
    ...     new_document = await coll.find_one({'i': 51})
    ...     print('document is now %s' % pprint.pformat(new_document))
    ...
    >>> IOLoop.current().run_sync(do_update)
    updated 1 document
    document is now {'_id': ObjectId('...'), 'i': 51, 'key': 'value'}
    "key" is set to "value" and "i" is still 51.
    update_one() only affects the first document it finds, you can update all of them with update_many():
    

    “key”被设置为”value”,”i”仍然是51。 update_one()只影响它找到的第一个文档,可以用update_many()更新所有的文档:

    await coll.update_many({'i': {'$gt': 100}},
                         {'$set': {'key': 'value'}})
    

    See also

The MongoDB documentation on 参考 update

Removing Documents

删除文档 delete_many() takes a query with the same syntax as find(). delete_many() immediately removes all matching documents. delete_many()使用与find()相同的语法进行查询。delete_many()()立即删除所有匹配的文档。

>>> async def do_delete_many():
...     coll = db.test_collection
...     n = await coll.count_documents({})
...     print('%s documents before calling delete_many()' % n)
...     result = await db.test_collection.delete_many({'i': {'$gte': 1000}})
...     print('%s documents after' % (await coll.count_documents({})))
...
>>> IOLoop.current().run_sync(do_delete_many)
2000 documents before calling delete_many()
1000 documents after

See also 参考 The MongoDB documentation on

remove

Commands

命令 All operations on MongoDB are implemented internally as commands. Run them using the command() method onMotorDatabase: MongoDB上的所有操作都作为命令内部实现。使用MotorDatabase的 command()方法运行它们:

.. doctest:: after-inserting-2000-docs
>>> from bson import SON
>>> async def use_distinct_command():
...     response = await db.command(SON([("distinct", "test_collection"),
...                                      ("key", "i")]))
...
>>> IOLoop.current().run_sync(use_distinct_command)

Since the order of command parameters matters, don’t use a Python dict to pass the command’s parameters. Instead, make a habit of using bson.SON, from the bson module included with PyMongo. Many commands have special helper methods, such as create_collection() or aggregate(), but these are just conveniences atop the basic command() method. 由于命令参数的顺序很重要,所以不要使用Python dict来传递命令的参数。取而代之的是,养成使用包含在PyMongo的bson 模块的bson.SON习惯,。 许多命令都有特殊的帮助方法,如 create_collection()或aggregate(),但这些方便的命令基于基本 command()方法。 See also

The MongoDB documentation on 参考: commands

Further Reading

The handful of classes and methods introduced here are sufficient for daily tasks. The API documentation for MotorClient, MotorDatabase, MotorCollection, and MotorCursor provides a reference to Motor’s complete feature set. Learning to use the MongoDB driver is just the beginning, of course. For in-depth instruction in MongoDB itself, see The MongoDB Manual. 这里介绍的少数类和方法对于日常任务来说是足够的。MotorClient、MotorDatabase、MotorCollection、MotorCursor 等的API文档为Motor的完整特征集提供了参考。 当然,学习使用MongoDB驱动程序仅仅是个开始。对于MongoDB本身的深入研究,请参见MongoDB手册

  本人的更多原创文章请加入个人微信公众号。