Feb 5, 2020 - jupyter-notebook&nbdev&github实现python开发全过程

前言

以前一直用sublime文本编辑器写代码。不得不说作为一个业余编程爱好者，这款文本编辑器太好用了，几乎满足了所有编写代码的需求。但同时缺点也很明显，你必须写完代码到命令行窗口下测试，或者版本管理等任何其它事情，虽然sublime有很多插件也许能满足你的需求，但毕竟插件功能有限，也不一定很好找到合适的。直到最近看到有一篇介绍nbdev的文章，介绍基于jupyter-notebook的nbdev说成可以完成整个编程生命周期的神器，可以结合github自动完成很多工作。正好在冠状肺炎病毒肆虐之际，有时间在家里学习一下，是否可以能让我能鸟枪换炮呢？

jupyter-notebook

先来说说jupyter-notebook。这是一个交互式笔记本，支持运行40 多种编程语言。Jupyter-Notebook在浏览器中创建和共享程序文档，支持实时代码，数学方程，可视化和 markdown。 -用途包括：数据清理和转换，数值模拟，统计建模，机器学习等等。如果用它来写python，有点类似python自带的IDLE，但比IDLE功能多，编写更多类型的文件，可以保存代码运行的结果。它具有以下优势：
- -可选择语言：支持超过40种编程语言，包括Python、R、Julia、Scala等。 -分享笔记本：可以使用电子邮件、Dropbox、GitHub和Jupyter Notebook Viewer与他人共享。 -交互式输出：代码可以生成丰富的交互式输出，包括HTML、图像、视频、LaTeX等等。 -大数据整合：通过Python、R、Scala编程语言使用Apache Spark等大数据框架工具。支持使用pandas、scikit-learn、ggplot2、TensorFlow来探索同一份数据

安装安装很简单，因为是基于python开发，直接用pip install jupyter经过漫长的等待顺利安装完成（pip服务器实在太慢，考虑换一个国内的）。安装时注意python和其下面的pip命令执行路径是否在PATH环境变量中，如果不在建议加上。
运行在命令行下执行：
```
jupyter notebook
```
命令会启动一个web服务器，并自动打开默认浏览器，自动指向http://localhost:8888/tree的链接，显示当前目录的文件列表。
简单的使用
```
jupyter notebook --generate-config
```
在用户目录下生成配置文件，打开“.jupyter”文件夹，可以看到里面有个配置文件，下次启动会使用这个默认配置文件。新建一个文档：输入：在这里输入一条然后运行：查看快捷键：这样jupyter基本使用就可以了，我们也可以使用ctl+s随时保存文档；这样我们所有操作记录就保存下来，下次打开直接使用。

nbdev介绍

notebook 或 REPL 不具备的功能，比如：优秀的文档查找功能、优秀的语法高亮功能、集成单元测试，以及（关键的）生成最终可分发源代码文件的能力。nbdev增强了所谓“探索式”编程的功能，同时将 IDE/编辑器开发的优势带入 notebook 系统中，以便用户在 notebook 中完成开发，且不会影响整个项目生命周期。本文的大部分内容基于！nbdev教程

安装

和安装其他python模块一样：

pip install nbdev

设置Repo

新建模板在登录自己github账号之后，点击nbdev template(https://github.com/fastai/nbdev_template/generate) 。会自动在自己git账号下产生一个Repo，包括一个符合nbdev的最小模板，你可以以此为起点，将该库clone到本地后进行代码编写。
github pages 该模板doc目录下是一个基于github的个人主页（项目文档）。要启用该主页，需到自己的github上打开刚才生成的Repo，在进入setting菜单，下列到Github Pages,设置”Source” to Master branch /docs选项，保存之后重新回到该选项，会有一个url就代表这个Repo的文档主页。这时可以回到Repo的页面，点击Edit，在Website输入框将刚才的url写入。

修改 settings.init

这个文件自动生成在跟目录下，包含了需要打包库文件的所有信息，去掉以下一些注释去掉(否则生成代码会报错):

# lib_name = your_project_name
# user = your_github_username
# description = A description of your project
# keywords = some keywords
# author = Your Name
# author_email = email@example.com
# copyright = Your Name or Company Name

目录结构一个为根目录下有jupyter notebokk的文件；doc目录（刚才已经提到）；一个打包生产你python代码的目录。

安装git 钩子

Jupyter Notebooks可能会引起版本冲突，可以在终端中cd到项目文件目录下执行:

nbdev_install_git_hooks

这样在commit时会去掉元数据(metadata)，以减少冲突的可能

编辑 00_core.ipynb

编辑00_core.ipynb，你的代码可以从这个文件开始写

# default_exp core

表示会产生 lib_name/core.py库

#export
def say_hello(to):
    "Say hello to somebody"
    return f'Hello {to}!'

定义一个函数，#export表示会包含在生成的库文件中。之后你可以写一些说明文档，包括增加一个测试脚本：

assert say_hello("Jeremy")=="Hello Jeremy!"

生成代码

你可以在命令行中执行nbdev_build_lib，或者在notebook中执行：

from nbdev.export import *
notebook2script()

结束之后，可以看到在lib_name目录下生成了相应的core.py文件

编辑index.ipynb

该文档会自动生成doc目录下的index.html。是整个项目的说明文档的起始页面。帮助文件中的例子应该是notebook包括输出的cell。

生成文档

该只需简单的在命令行执行nbdev_build_docs，就能生成文档。

$ nbdev_build_docs
converting: /home/jhoward/git/nbdev/nbs/00_core.ipynb
converting: /home/jhoward/git/nbdev/nbs/index.ipynb

在doc目录下可以看到，00_core.ipynb对应生成core.html,index.ipynb对应生成index.html

commit到github

这时，你可以使用git commit/git push将修改提交到github。实际在测试时，因为修改过的文件没有加入跟踪，所以在用了git add -A命令之后，才将所有修改提交到github。依托了github的最新Actions功能，push之后，github会自动执行测试等任务。
GitHub Actions 的主要作用就是让用户能够在 GitHub 服务器上直接执行和测试代码，只需几个简单步骤就可以实现构建、共享和执行代码。其实现了所谓的CI持续集成（CONTINUOUS INTEGRATION）的功能。在持续集成环境中，开发人员将会频繁的提交代码到主干。这些新提交在最终合并到主线之前，都需要通过编译和自动化测试流进行验证，而这一切在github都是免费的。你可以进到github的项目首页，点击commit，将会看到每次提交的过程，包括测试是否成功等等。

参考

nbdev官方网站：[https://nbdev.fast.ai/ ]https://nbdev.fast.ai/
nbdev: use Jupyter Notebooks for everything:https://www.fast.ai/2019/12/02/nbdev/
中文版：https://tech.sina.com.cn/roll/2020-02-03/doc-iimxxste8470190.shtml

Sep 1, 2019 - flask-restful用户指南-快速入门（摘自官方文档）

什么是Flask-RESTful

是对Flask的扩展，它增加了对快速构建RESTAPI的支持。它是一个轻量级的抽象，可以与现有的ORM/库一起工作。Flask-RESTful鼓励最佳实践与最低限度的设置。如果你熟悉Flask，Flask-RESTful应该很容易使用。官方文档位于（ https://flask-restful.readthedocs.io/en/latest/ ）

安装

pip安装

pip install flask-restful需要Flask 0.10

flask-restful需要Flask 0.10以上，Python version 2.7, 3.4, 3.5, 3.6 or 3.7

快速入门

最小的API ``` from flask import Flask from flask_restful import Resource, Api

app = Flask(name) api = Api(app)

class HelloWorld(Resource): def get(self): return {‘hello’: ‘world’}

api.add_resource(HelloWorld, ‘/’)

if name == ‘main’: app.run(debug=True)

保存为api.py之后运行：

$ python api.py

Running on http://127.0.0.1:5000/
Restarting with reloader 然后在命令行下执行：$ curl http://127.0.0.1:5000/来验证是否输出{“hello”: “world”}```

资源路由

flask-RESTful提供的主要构件是资源。资源构建在Flask可插拔视图(http://flask.pocoo.org/docs/views/) 之上，只需在资源上定义方法，就可以方便地访问多个HTTP方法。todo应用程序的基本CRUD资源(当然)如下所示：

from flask import Flask, request
from flask_restful import Resource, Api

app = Flask(__name__)
api = Api(app)

todos = {}

class TodoSimple(Resource):
    def get(self, todo_id):
        return {todo_id: todos[todo_id]}

    def put(self, todo_id):
        todos[todo_id] = request.form['data']
        return {todo_id: todos[todo_id]}

api.add_resource(TodoSimple, '/<string:todo_id>')

if __name__ == '__main__':
    app.run(debug=True)

你可以这样试一试：

$ curl http://localhost:5000/todo1 -d "data=Remember the milk" -X PUT
{"todo1": "Remember the milk"}
$ curl http://localhost:5000/todo1
{"todo1": "Remember the milk"}
$ curl http://localhost:5000/todo2 -d "data=Change my brakepads" -X PUT
{"todo2": "Change my brakepads"}
$ curl http://localhost:5000/todo2
{"todo2": "Change my brakepads"}

如果安装了requests库，也可以从python执行：

>>> from requests import put, get
>>> put('http://localhost:5000/todo1', data={'data': 'Remember the milk'}).json()
{u'todo1': u'Remember the milk'}
>>> get('http://localhost:5000/todo1').json()
{u'todo1': u'Remember the milk'}
>>> put('http://localhost:5000/todo2', data={'data': 'Change my brakepads'}).json()
{u'todo2': u'Change my brakepads'}
>>> get('http://localhost:5000/todo2').json()
{u'todo2': u'Change my brakepads'}

flask-RESTful从视图方法中理解多种类型的返回值。类似于Flask，您可以返回任何可迭代的包括原始的flask响应对象,并且它将被转换为response。flask-RESTful还支持使用多个返回值设置响应代码和响应头，如下所示：

class Todo1(Resource):
    def get(self):
        # Default to 200 OK
        return {'task': 'Hello world'}

class Todo2(Resource):
    def get(self):
        # Set the response code to 201
        return {'task': 'Hello world'}, 201

class Todo3(Resource):
    def get(self):
        # Set the response code to 201 and return custom headers
        return {'task': 'Hello world'}, 201, {'Etag': 'some-opaque-string'}

Endpoints

很多时候，在API中，您的资源将有多个URL。可以将多个URL传递给Api对象上的add_resources()方法。每一个都会被路由到你的资源。

api.add_resource(HelloWorld,
    '/',
    '/hello')

可以将路径的部分作为变量匹配到资源方法中。

api.add_resource(Todo,
    '/todo/<int:todo_id>', endpoint='todo_ep')

参数语法

虽然Flask提供了对请求数据（即querystring或表单后编码数据）的简单访问，但验证表单数据仍然是一个难题。Flask RESTful内置了对使用类似于argparse的库进行请求数据验证的支持。

from flask_restful import reqparse

parser = reqparse.RequestParser()
parser.add_argument('rate', type=int, help='Rate to charge for this resource')
args = parser.parse_args()

注意：与argparse模块不同，reqparse.RequestParser.parse_args）返回一个Python字典，而不是自定义数据结构。

使用reqparse模块还可以免费提供正常的错误消息。如果参数未能通过验证，Flask RESTful将响应400个错误请求和一个突出显示错误的响应。

$ curl -d 'rate=foo' http://127.0.0.1:5000/todos
{'status': 400, 'message': 'foo cannot be converted to int'}

inputs模块提供了许多包含的公共转换函数，如inputs.date()和inputs.url()。

使用strict=True调用parse_args可确保在请求包含解析器未定义的参数时抛出错误。

args = parser.parse_args(strict=True)

数据格式

默认情况下，返回的可迭代中的所有字段都将按原样呈现。虽然这在处理Python数据结构时非常有效，但在处理对象时可能会变得非常令人沮丧。为了解决这个问题，Flask RESTful提供了fields模块和marshal_with（）装饰。与Django ORM和WTForm类似，您使用fields模块来描述你响应的结构。

from flask_restful import fields, marshal_with

resource_fields = {
    'task':   fields.String,
    'uri':    fields.Url('todo_ep')
}

class TodoDao(object):
    def __init__(self, todo_id, task):
        self.todo_id = todo_id
        self.task = task

        # This field will not be sent in the response
        self.status = 'active'

class Todo(Resource):
    @marshal_with(resource_fields)
    def get(self, **kwargs):
        return TodoDao(todo_id='my_todo', task='Remember the milk')

上面的示例接受一个python对象并准备将其序列化。marshal_with（）装饰将应用于resource_fields字段描述的转换。从对象中提取的唯一字段是task。fields.Url字段是一个特殊的字段，它接受一个endpoint名称，并在响应中为该端点生成一个Url。您需要的许多字段类型已经包括在内。有关完整列表，请参阅“fields”指南。

完整示例

将此示例保存在api.py中

from flask import Flask
from flask_restful import reqparse, abort, Api, Resource

app = Flask(__name__)
api = Api(app)

TODOS = {
    'todo1': {'task': 'build an API'},
    'todo2': {'task': '?????'},
    'todo3': {'task': 'profit!'},
}


def abort_if_todo_doesnt_exist(todo_id):
    if todo_id not in TODOS:
        abort(404, message="Todo {} doesn't exist".format(todo_id))

parser = reqparse.RequestParser()
parser.add_argument('task')


# Todo
# shows a single todo item and lets you delete a todo item
class Todo(Resource):
    def get(self, todo_id):
        abort_if_todo_doesnt_exist(todo_id)
        return TODOS[todo_id]

    def delete(self, todo_id):
        abort_if_todo_doesnt_exist(todo_id)
        del TODOS[todo_id]
        return '', 204

    def put(self, todo_id):
        args = parser.parse_args()
        task = {'task': args['task']}
        TODOS[todo_id] = task
        return task, 201


# TodoList
# shows a list of all todos, and lets you POST to add new tasks
class TodoList(Resource):
    def get(self):
        return TODOS

    def post(self):
        args = parser.parse_args()
        todo_id = int(max(TODOS.keys()).lstrip('todo')) + 1
        todo_id = 'todo%i' % todo_id
        TODOS[todo_id] = {'task': args['task']}
        return TODOS[todo_id], 201

##
## Actually setup the Api resource routing here
##
api.add_resource(TodoList, '/todos')
api.add_resource(Todo, '/todos/<todo_id>')


if __name__ == '__main__':
    app.run(debug=True)

示例用法

$ python api.py
 * Running on http://127.0.0.1:5000/
 * Restarting with reloader

获取列表

$ curl http://localhost:5000/todos
{"todo1": {"task": "build an API"}, "todo3": {"task": "profit!"}, "todo2": {"task": "?????"}}

获取单个任务

$ curl http://localhost:5000/todos/todo3
{"task": "profit!"}

删除任务

$ curl http://localhost:5000/todos/todo2 -X DELETE -v

> DELETE /todos/todo2 HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host: localhost:5000
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 204 NO CONTENT
< Content-Type: application/json
< Content-Length: 0
< Server: Werkzeug/0.8.3 Python/2.7.2
< Date: Mon, 01 Oct 2012 22:10:32 GMT

增加一个新任务

$ curl http://localhost:5000/todos -d "task=something new" -X POST -v

> POST /todos HTTP/1.1
> User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
> Host: localhost:5000
> Accept: */*
> Content-Length: 18
> Content-Type: application/x-www-form-urlencoded
>
* HTTP 1.0, assume close after body
< HTTP/1.0 201 CREATED
< Content-Type: application/json
< Content-Length: 25
< Server: Werkzeug/0.8.3 Python/2.7.2
< Date: Mon, 01 Oct 2012 22:12:58 GMT
<
* Closing connection #0
{"task": "something new"}

更新一个任务

$ curl http://localhost:5000/todos/todo3 -d "task=something different" -X PUT -v

> PUT /todos/todo3 HTTP/1.1
> Host: localhost:5000
> Accept: */*
> Content-Length: 20
> Content-Type: application/x-www-form-urlencoded
>
* HTTP 1.0, assume close after body
< HTTP/1.0 201 CREATED
< Content-Type: application/json
< Content-Length: 27
< Server: Werkzeug/0.8.3 Python/2.7.3
< Date: Mon, 01 Oct 2012 22:13:00 GMT
<
* Closing connection #0
{"task": "something different"}

Sep 1, 2019 - python属性查找（attribute lookup）是怎么回事？

python属性查找（attribute lookup）是怎么回事？

学习python，绕不过学习python的属性查找问题。原来引用python的属性不是简单的该对象的属性值，它有一定的规则来判断到底取那个对象的属性或方法的值，涉及到描述符（descriptor）和装饰器（Decorators）方面的知识。但具体什么规则，查了好多网上文章，虽然说法不错，但因为写的角度不同，总觉得没说清楚。经过实验，终于搞清楚了背后的机制。
以下内容转至：https://www.cnblogs.com/Jimmy1988/p/6808237.html
在Python中，属性查找（attribute lookup）是比较复杂的，特别是涉及到描述符descriptor的时候。首先，我们知道：python中一切都是对象，“everything is object”，包括类，类的实例，数字，模块任何object都是类（class or type）的实例（instance）如果一个descriptor只实现了__get__方法，我们称之为non-data descriptor，如果同时实现了__get__ set__我们称之为data descriptor。按照python doc，如果obj是某个类的实例，那么obj.name首先调用__getattribute。如果类定义了__getattr__方法，那么在__getattribute__抛出 AttributeError 的时候就会调用到__getattr__，而对于描述符(get）的调用，则是发生在__getattribute__内部的。
obj = Clz(), 那么obj.attr 顺序如下：

        （1）如果“attr”是出现在Clz或其基类的__dict__中， 且attr是data descriptor， 那么调用其__get__方法, 否则
        （2）如果“attr”出现在obj的__dict__中， 那么直接返回 obj.__dict__['attr']， 否则
        （3）如果“attr”出现在Clz或其基类的__dict__中
            （3.1）如果attr是non-data descriptor，那么调用其__get__方法， 否则
            （3.2）返回 __dict__['attr']
        （4）如果Clz有__getattr__方法，调用__getattr__方法，否则
        （5）抛出AttributeError 

　　下面是测试代码： ```    
#coding=utf-8
class DataDescriptor(object):
    def __init__(self, init_value):
        self.value = init_value
  
    def __get__(self, instance, typ):
        return 'DataDescriptor __get__'
  
    def __set__(self, instance, value):
        print ('DataDescriptor __set__')
        self.value = value
 
class NonDataDescriptor(object):
    def __init__(self, init_value):
        self.value = init_value

    def __get__(self, instance, typ):
        return('NonDataDescriptor __get__')
 
class Base(object):
    dd_base = DataDescriptor(0)
    ndd_base = NonDataDescriptor(0)

 
class Derive(Base):
    dd_derive = DataDescriptor(0)
    ndd_derive = NonDataDescriptor(0)
    same_name_attr = 'attr in class'
 
    def __init__(self):
        self.not_des_attr = 'I am not descriptor attr'
        self.same_name_attr = 'attr in object'
 
    def __getattr__(self, key):
        return '__getattr__ with key %s' % key
 
    def change_attr(self):
        self.__dict__['dd_base'] = 'dd_base now in object dict '
        self.__dict__['ndd_derive'] = 'ndd_derive now in object dict '
 
def main():
    b = Base()
    d = Derive()
    print('Derive object dict', d.__dict__)
    assert d.dd_base == "DataDescriptor __get__"
    assert d.ndd_derive == 'NonDataDescriptor __get__'
    assert d.not_des_attr == 'I am not descriptor attr'
    assert d.no_exists_key == '__getattr__ with key no_exists_key'
    assert d.same_name_attr == 'attr in object'
    d.change_attr()
    print('Derive object dict', d.__dict__)
    assert d.dd_base != 'dd_base now in object dict '
    assert d.ndd_derive == 'ndd_derive now in object dict '

    try:
        b.no_exists_key
    except Exception, e:
        assert isinstance(e, AttributeError)
 
if __name__ == '__main__':
    main() ```
    调用change_attr方法之后，dd_base既出现在类的__dict__（作为data descriptor）, 也出现在实例的__dict__， 因为attribute lookup的循序，所以优先返回的还是Clz.__dict__['dd_base']。而ndd_base虽然出现在类的__dict__， 但是因为是nondata descriptor，所以优先返回obj.__dict__['dd_base']。其他：line48,line56表明了__getattr__的作用。line49表明obj.__dict__优先于Clz.__dict__
    前面提到过，类也是对象，类是元类（metaclass）的实例，所以类属性的查找顺序基本同上，区别在于第二步，由于Clz可能有基类，所以是在Clz及其基类的__dict__查找“attr"<br>
    补充说明：很多网上说属性最高优先级是__getattribute__。但一般我们不会去覆盖__getattribute__，因为一旦覆盖了__getattribute__，所有后面的规则都失效了，所有要理解上面提到的"生在__getattribute__内部"。<br>
   再看以下代码： ```       
import functools, time
class cached_property(object):
    """ A property that is only computed once per instance and then replaces
        itself with an ordinary attribute. Deleting the attribute resets the
        property. """

    def __init__(self, func):
        functools.update_wrapper(self, func)
        self.func = func

    def __get__(self, obj, cls):
        if obj is None: return self
        value = obj.__dict__[self.func.__name__] = self.func(obj)
        return value

class TestClz(object):
    @cached_property
    def complex_calc(self):
        print('very complex_calc')
        return sum(range(100))

if __name__=='__main__':
    t = TestClz()
    print('>>> first call')
    print(t.complex_calc)
    print('>>> second call')
    print(t.complex_calc) ```
cached_property是一个non-data descriptor。在TestClz中，用cached_property装饰方法complex_calc，返回值是一个descriptor实例，所以在调用的时候没有使用小括号。第一次调用t.complex_calc之前，obj(t)的__dict__中没有”complex_calc“， 根据查找顺序第三条，执行cached_property.__get__, 这个函数代用缓存的complex_calc函数计算出结果，并且把结果放入obj.__dict__。那么第二次访问t.complex_calc的时候，根据查找顺序，第二条有限于第三条，所以就直接返回obj.__dict__['complex_calc']。<br> 再举一个网上常见的例子： ```
class NotNegative():
    def __init__(self,name):
        self.name = name

    def __set__(self, instance, value):
        if value < 0:
            raise ValueError(self.name+' must be >= 0')
        else:
            instance.__dict__[self.name] = value

class Product():
    quantity = NotNegative('quantity')
    price = NotNegative('price')

    def __init__(self,name,quantity,price):
        self.name = name
        self.quantity = quantity
        self.price = price

book = Product('mybook',2,5) ```
    通过描述符来限制设置quantity和price必须是非负。在该例子中，如果执行book.quantity=3,解释器会先查找实例属性，发现有quantity属性，但是解释器又发现同样有一个类属性是描述符，于是解释器最终会选择走描述符这条路。然后因为是描述符，于是会执行描述符中的set特殊方法。<br>

    描述符中的set特殊方法的参数有为<br> ```        
    self ：是描述符实例
    instance ：是相当于例子中的实例book
    value ：就是要赋予的值 ``` get方法同样有3个参数self, instance, owner。self，instance与set中的相同，owner为例子中的Product类<br>

python的描述符和装饰器功能强大，用好了极大地增强了代码的复用性和可读性，但同时也是比较难学的知识点，多看看别人的代码可以帮助我们更快的理解。

Page: 1 of 16 Older

Recent Posts

Links