About
Congratulations to Melissa Weber Mendonça for her Sphinx video tutorial hitting 20,000 views on YouTube!
Video: Sphinx for Python Documentation Tutorial (~75 minutes)
This tutorial focuses on a high-level explanation of how the Sphinx tool for generating documentation automatically works for Python packages, using NumPy as an example (but won’t be restricted to the NumPy use case). We’ll talk about advantages and disadvantages of choosing different documentation systems and how to integrate other types of documents (for example, Jupyter Notebooks) in the documentation for a given package.
- GitHub repo: minimalsphinx
- Slides: Intro to Sphinx for Python Documentation
About the Speaker: Melissa Mendonca Weber
Melissa is an applied mathematician and former university professor turned software enginneer. Nowadays she works at Quansight, developing open-source software and leading the Documentation Team for NumPy. She is also a LaTeX, Fortran and free software enthusiast.
- GitHub: @melissawm
- LinkedIn: @axequalsb
Video Outline: Timestamps
00:00:00 Reshama introduces Data Umbrella
00:05:21 Reshama introduces Melissa
00:06:50 Melissa begins talk
00:07:38 Tutorial Introduction
00:09:01 How Melissa got involved with NumPy / Invitation to collaborate with NumPy documentation
00:10:01 Impostor Syndrome explanation
00:12:08 What is documentation? (Tutorials, How-to Guides, Explanation, Reference)
00:16:54 What is Sphinx?
00:18:00 configuration file: conf.py
00:19:00 Getting started with sphinx (installation) (conf.py, index.rst, _build/html)
00:20:43 What is reStructuredText? (rst files)
00:22:21 demo of MinimalSphinx Project: https://github.com/melissawm/minimals…
00:22:55 pokemon
00:24:05 What is reStructuredText?
00:24:25 Comparing reStructuredText (reST) to Markdown (md)
00:25:51 reStructuredText format II (directives are blocks of explicit markup)
00:27:44 Auto-documenting a python package (autodoc)
00:28:45 Q&A: Do the docstrings have to be in rst?
00:29:30 “:members:” options
00:29:58 Other extensions (sphinx.ext.doctest, sphinx.ext.intersphinx)
00:33:08 Minimal sphinx, example
00:35:00 run “make html” command
00:39:50 Q\&A: Should we consider doctest vs pytest? (They are complementary. Use both.)
00:40:30 Example: NumPy docs (how to contribute) (https://numpy.org/doc/stable/dev/)
00:43:20 Getting involved with NumPy, Communicating: https://numpy.org/contribute/
00:43:27 A pull request to the NumPy documentation
00:44:20 Fixing a Typo in NumPy Docs
00:46:20 Fork numpy repo on GitHub
00:47:15 Clone repo
00:47:55 Add remote: git remote add upstream
00:49:05 Branch: git checkout -b doc_fix
00:49:43 Find file: cd doc/source/user/
00:52:30 Set up virtual environment
00:53:40 pip install cython (numpy dependency)
00:56:00 build the docs: “cd doc”, “make html”
01:00:30 Q&A: Can sphinx be used for other languages? (yes)
01:00:55 Q&A: Why did we build NumPy? (we want to generate doc with latest version of numpy)
01:03:33 Jupyter notebooks
01:04:15 Final thoughts by Melissa
01:06:15 Q&A: Why do we build from source?
Transcript
00:00:00 Reshama introduces Data Umbrella
Hello, everyone, and welcome to Data Umbrella’s webinar. So the way the webinar will work is I’m going to do a brief introduction. Melissa is going to do her talk. And then you can ask questions either in chat or Q&A. And we’ll sort of get the questions answered as– we’ll answer all the questions, but we’ll answer them as we can have a comfortable break in between the talk. And this webinar is being recorded. A little bit about Data Umbrella, we are an inclusive community for underrepresented persons in data science, and we are a volunteer-run organization. Briefly about me, I’m a statistician and data scientist. I have a master’s in statistics and an MBA from NYU in business analytics. And I’m the founder of Data Umbrella. I am on Twitter, LinkedIn, and GitHub as Reshma S. So if you would like to connect with me, feel free to follow me. We have a code of conduct. The goal of our community is to be as welcoming and inclusive and professional as possible. And so the code of conduct is linked on the sticky in the chat. So please adhere to it and contribute to making this a welcoming and friendly community for all. And the code of conduct applies to the chat as well. There are various ways that you can support our organization. The primary one, the first one, is to follow our code of conduct and contribute to the community. We have a Discord server where you can ask and answer general questions. You can share events and job postings there as well. And we have an initiative where we have transcripts for all of our video events like this webinar right now. And you can help us edit the transcripts that come from YouTube, which are in a raw format but need some editing to be more accurate. Another way that you can support us is to donate to our Open Collective. And that is at opencollective.com, that data umbrella and any donation you can make would be helpful and welcome to cover our operating costs. Data umbrella, we are on all platforms. Depending on the platform of your choice, you can search for us on Meetup. We’re on YouTube with a great video library. We have a job board and a newsletter which goes out once a month. And we also have a lot of resources on our website related to diversity, allyship, inclusive language, communities. Check it out. And the link to our Discord is on our website. We have a couple of really great playlists on YouTube. And one of them is contributing to the open source. We’ve had a series of open source webinars. NumPy, Scikit-learn, pandas, and core Python. So if you want to learn more about that, check out those videos. They’re really great. Career advice is always in demand. And so we have a playlist by three terrific speakers who have shared their insight on careers in tech and data science. So if that is something that is of interest to you, please check it out. And this is just a snippet of all the other events that we have done. So depending on your interest and what you’re looking for, check them out. We have a job board. It’s jobs@dataumbrella.org. And you can also subscribe to it for updates. So check that out as well. We have a highlighted job here, which is a cloud infrastructure engineer at Coiled. And it is a remote position. And you can find out more information about how to apply and the job description on our job board. And I will share some of these key links in the chat as well as soon as I finish presenting. And this is just a reiteration of– we have a lot of resources on our website. And our website is dataumbrella.org. And also another iteration of where you can find us. We are on LinkedIn. We’re on Facebook. We’re on Twitter. And the best place to find out about upcoming events is on our Meetup. We post it in different places, but Meetup is the first place to find upcoming events. We have an upcoming Scikit Learn sprint. The website is afme2021.dataumbrella.org. We do have a wait list at this time. But feel free to check it out. And we do have resources there also on contributing. There’s an upcoming event. It’s a community event. So it’s organized by Global Diversity CFP Day, which is on February 20. It was supposed to be last weekend, but was rescheduled due to some issues that came up. But if you would like to get started in speaking at a conference or a meetup or even just learning how to write CFPs, which is a call for proposal, check out these live streams. They have six live streams for each of the regions in the world. So they’re friendly to the time zone that you’re at. So check it out. And the event is free.
I’d like to introduce today’s speaker. I also just want to share briefly why this event was organized. I’ve attended a bunch of– and organized a bunch of Scikit Learn sprints. And I was working on a PR, a pull request for documentation. And it was like a mystery to where these files were being produced. I saw the files produced, but I couldn’t see the code for it. And I was searching on Google. And I realized Sphinx is super powerful, also complex. And I thought, really, I want to learn more about it. And so I went on a search to find someone who could speak about Sphinx. And I went from three or four different people and finally led me to Melissa, who I actually have met. I just didn’t know that Melissa was the person to go speak to about Sphinx. So I’m glad that I was referred to Melissa. And Melissa is joining us from Brazil.
00:05:21 Reshama introduces Melissa
Melissa is an applied mathematician and former university professor turned software engineer. And she works at QuantSight now. Her pronouns are she/her. And she is also a tech, for-trend, and free software enthusiast. And Melissa is on GitHub and Twitter @MelissaWM. So check it out. And with that, I am going to turn off my camera and mic and hand over the screen to Melissa.
00:06:50 Melissa begins talk
Thank you, Reshma. I’m so happy to be here. And I just want to point out, as much as I am talking about Sphinx today, it’s not like I am the biggest and more profound– don’t have the most profound knowledge about Sphinx. I think there are other people who could also speak to it. I just think it’s a great opportunity to share and to explain. And sometimes not understanding things is the best way to explain things to other people, because you find the same issues, and you have the same troubles, and you find the same difficulties that other people might face when they start doing this. So thank you so much for the invitation. I’m really happy to be a part of Data Umbrella.
00:07:38 Tutorial Introduction
So I’m going to do a little intro to Sphinx for Python documentation. And this is kind of going to be in two parts. So the first part is going to be more about Sphinx and how it’s built and how to start to understand how it works. And the second part will be more of an applied thing where we’ll fix something in the NumPy documentation. It will be a trivial fix. And those are usually not encouraged in NumPy just because we have so little maintainers and so much to do. But it’s just going to be an example. And if ever you are interested in doing a PRR for documentation in NumPy, it can be a good example of how to start. So I have a few links there. The first one is for the slides, the actual slides that I’m showing you. You can access them with that link from hackmd.io. There’s a repo, which is called MinimalSphinx, where I have the tiniest example of a module and how you can use Sphinx to generate the documentation for that module. And then I have a link to the NumPy docs where you can see the kind of documentation that we do.
00:09:01 How Melissa got involved with NumPy / Invitation to collaborate with NumPy documentation
So just to explain who I am and why I’m here, I’ve been working with the NumPy documentation since the beginning of 2020. I’ve been leading the documentation team with some people who are also here. I saw that Ross was here before. I don’t know if there’s other people from the docs team or from NumPy here. But the idea of the docs team is to concentrate our efforts into the documentation that we want to do for NumPy and have a larger vision of how we want this documentation to be improved. So everyone can be a part of the documentation team. You just have to show up. And if you are interested in working in the documentation for NumPy, you can join our open meetings that happen biweekly. So if you’re interested, just ask around. And you can check our contribute page for NumPy as well to see how to do that, how to go to our meetings.
00:10:01 Impostor Syndrome explanation
So first things first, I want to show this picture. And actually, the first time that I saw a picture similar to this one was with Reshma because we were at the discount conference in New York that was organized by NumFocus. And this picture was so– for me, it was awesome because I could finally understand what I was feeling. And how I could reframe the way that I was thinking about things. So this is about imposter syndrome, which is when you think that you know nothing and that people will finally realize that you’re not supposed to be here or that you don’t know enough to do what you’re doing. And so in your mind, sometimes you have the picture on the left, which is like a big circle of what I think that other people know. And inside, there’s a smaller circle that says what I know. So you have an idea that other people know a lot of things, and you know nothing. But reality is actually closer to the picture on the right, which is a bunch of intersecting circles, and each contributing to a little bit of the information. So what I know and what other people know are actually complementary things. And they work well together and not as in subsets of each other. So the idea that you can not contribute because you have nothing to say or you have nothing to add to a project or to something that you want to work with is usually false because we each have different perspectives and we’ll each bring something to the table that wasn’t there before. So I strongly encourage you to think about contributing if you have the opportunity, if you have the time, and if you have the resources. Its diverse communities make our communities better. And having those different points of view help.
00:12:08 What is documentation? (Tutorials, How-to Guides, Explanation, Reference)
So let’s talk about documentation. What is documentation? It is funny because sometimes talking to people who are not in an open source context, they have a different understanding of what documentation means. So for example, for many people, documentation for a software project means the auto-generated module documentation or the API documentation. This is actually just a tiny piece of what we call documentation in an open source project and in larger projects. So the picture that I’m showing you here is by Devio, which is a company that works with Django. And they have someone called Daniela Prochida, who organized the idea of a documentation system. So he realized that we could most of the time divide the documentation for a software project into four parts, which are the four quadrants that you see in the picture. So the API documentation or the auto-generated documentation extracted from docstrings or from the interface of the functions or the methods of your software project, we call that reference documentation. And they are mainly information about how to use the software project. But there are other kinds of documentation, for example, which are learning-oriented or problem-oriented or understanding-oriented. So what are those? If you’re thinking about tutorials, you would think about something educational in which you are trying to explain good practices or ideas and concepts that you want people to figure out while using your project. So tutorials are something that we see often in open source project. But sometimes they are not exactly– it’s hard to make a difference between tutorials and how-to guides, which is the second kind of document that we’re going to talk about. So the how-to guides are supposed to be problem-oriented and just a series of steps that you would take to solve a problem. So right now, we have three different kinds of documents– the reference documentation, tutorials, which are supposed to be educational content, teaching you best practices, and how to do things, but with an underlying idea of teaching you processes, techniques, and best practices. How-to guides are mostly, how do I solve this problem? Oh, you follow this step, and then this step, and then this step. And in the end, you have a solution. Maybe you don’t understand every step, but that’s OK. So sometimes people call how-to guides tutorials, but that’s something else. It’s interesting to know that you can have those two kinds of focus for documents. And then the last kind of document that we have is an explanation. So what is an explanation? It could be a longer-form document explaining historical developments for the software project or why certain design decisions were made. And so the explanations are not always necessary. Not every project is going to need this, but often they are, and they just don’t exist, and they need writing. So I’m showing you this not to overwhelm you with information, but just to explain that there are different types of documentation. And especially if you’re looking to contribute to an open-source project, you might find yourself wanting to contribute to just one of those kinds of documents. And that’s OK. And so there are different options for how you can do this. I also have a link in the slides. So if you have access to the slides, there’s a link to NEP44, which is a document that we wrote for NumPy. And it expands a bit on the kinds of documentation, the things that we are looking at doing in NumPy. So if you want to know more, you can check that out. Or you can check the DVO website where they have– if you search Google for DVO documentation, you’ll certainly find the documentation system explanation. It’s actually very, very good for whatever project you’re working on.
00:16:54 What is Sphinx?
So in that context, what is Sphinx? I showed you before four different kinds of documentation. And Sphinx, just to be clear, could be used to generate any of them. But usually, you think of Sphinx as a documentation generator in the sense that it extracts doc strings, which are comments of code that we leave in our Python modules, to an HTML. Sphinx can take any plaintext source files and generate readable output. And for our use case, you can think of it as a program that takes in those plaintext files in a special format, which is called Restructured Text, and outputs HTML. Of course, you can output PDFs. Sphinx also can output EPUB and other kinds of documents. But usually, what we use in software projects is HTML to be able to read them on the web. It needs basically a configuration file, which is a conf.py file. And that is already a tip of how Sphinx works. So it’s basically a Python file with a bunch of dictionaries and lists inside. So it’s supposed to be readable as a Python file. And Sphinx is really extensible. So it has a bunch of different extensions that you can add to it to do extremely powerful things, just like Reshma was talking before. The thing is that it can be overwhelming. Exactly because it is so powerful, it can be overwhelming. So we’ll try to do something more contained and not go too deep and not talk about advanced usages of Sphinx just because this is an introduction. But there will certainly be much, much more if you search the internet for information on how to do Sphinx. So how does it work? Like, basically, if you have a project– I put it an empty project, but I meant like an empty documentation folder because you will have your project. Maybe you already have a module or you just have a Python file with some functions written in them. It doesn’t have to be very complicated. If you have a project with a Python file, you can just install Sphinx. So for example, you use pip install or you can use conda, whatever your package manager is.
00:18:00 configuration file: conf.py
You can initiate the configuration file using Sphinx-quickstart. It will generate a bunch of stuff for you. It will ask for the project’s name, who’s the author, and how do you want to organize things. Usually, you can just choose the defaults and it will create a sensible directory structure and organization for your project. You can edit your conf.py file and any other files that you wish to make them customized to your liking. And then you can build the outputs of your documentation when you are satisfied. So usually, in the default configuration, you will have something like make HTML, because you want to generate HTML. And you can find the generated documents under a folder called build or underscore build or something like this. So for the defaults, it’s basically how it works. What is this file format that I mentioned to you? So I said that the documentation, the plain text files would be written in a specific format called restructure text. So after running Sphinx-quickstart in a project, you get an index.rst file created.
00:19:00 Getting started with sphinx (installation) (conf.py, index.rst, _build/html)
Sorry, I just saw the question on the chat. Isn’t make a Linux command? Yes, if you are using– when you install Sphinx, I believe that you will have that command installed whatever system you’re in. It’s just supposed to compile the results in your HTML. But I definitely– I’m not sure. But I think it will work just the same in whatever– in whatever operating system. Yes, Ross, thanks. Make.bat for Windows. Make is actually a compilation command. So it’s not like a Linux or a Windows thing. It just compiles things for you. So if you have a source and you want to compile the result, you can use make, which will read a file and with the instructions about how to compile these things into the output that you want. So it will generate the HTML for you.
00:20:43 What is reStructuredText? (rst files)
Coming back to the Sphinx Restructure Text format, after running Sphinx-quickstart, an index.rst file is created. So in order to show you this, I am actually going to go to the repo that I mentioned to you. If you want to see that, you can also go to github.com/melissawm/minimalsphinx. I’ll try and paste it in the chat so you can join. If you go there, there’s a big read me with instructions about how to do this for your own project. I’ve done this, and so I can go to the docs folder that’s already here. And I can see the index.rst file that was generated by Sphinx. If I open this file, I can see that it’s a very simple– let me check the raw format because it’s easier to see what’s happening. So this is what I have.
00:22:55 pokemon
My module is actually a Pokedex, where I’m listing three Pokemon, which are these little monsters, creatures that were popular. I think they’re still popular. My son likes them. And then you can adapt this file to your liking like it says here. The first part is actually a comment, so it won’t show up in your rendered documentation. You have a title here, which is Welcome to Pokedex documentation. And you can see there’s a bunch of equal signs under this text. This means that this is a title. And then Sphinx, when it reads this file to generate the HTML, it will figure out how to show the results accordingly. And so you have other things that I’m going to mention later, but this is the basic format of your index.rst. Going back to the presentation, so Restructured Text is a markup syntax. So you could see that there’s little commands and things that you can tell it to get certain generated content that you want. So in that sense, it’s similar to Markdown, but you have to be careful. Because it is similar to Markdown, it’s easy to get things wrong. And so this is a source of confusion for many people. For example, the standard Rest inline markup is one asterisk for emphasis, which is italics, two asterisks for a text that is a strong emphasis or boldface, and back quotes for code samples. But you actually need two back quotes instead of one like you would do on Markdown. So you have to be careful about the little differences. And it is possible to use Markdown with Sphinx, but I’m not going to go there at this point.
00:24:25 Comparing reStructuredText (reST) to Markdown (md)
So the basic usage is using Restructured Text. If you have access to the slides, you can actually click this link where it says Restructured Text for the basics of the syntax and how you do other more advanced things with REST or Restructured Text. It is a very powerful syntax because it will help you to auto-generate the documentation including the inline markup that I mentioned, but also custom content, powerful linking, and cross-referencing features. So it is pretty powerful. It can do a lot of things. And if you take the time to learn, I promise it will help you generating those documentations. Rest also implements directives, which are blocks of explicit markup, which can have arguments, options, and content. So like I was showing you with the index.rst, I’m going to be back to that file now. So you can see, for example, this is what we call a directive. It’s a talk tree directive, which is meant to generate a table of contents for this file. And then it has options, for example, max depth, which is the depth of nesting that you want to list in your table of contents. And the caption that you’re going to use for the table of contents. So you can see there’s the characteristics of this. For example, indentation matters. And you have to use consistent indentation to make the compiled documents generate properly. There are other things that it’s possible to do. For example, here you can see another kind of syntax, which is a list. So the asterisks there are meant to be items of a list. And then you have a reference to another document that’s going to be called gen index. I don’t want to go into the details of all of these things right now, because it can be very overwhelming. But you can check and experiment with this in the MinimalSphinx repo. That’s why it’s there. So you can check that out on your own time and experiment and try and see what happens if you change one thing for the other. We’ll see some concrete examples of this in the NumPy documentation. But you can check a nice summary of REST syntax and a longer REST primer in the links that I’m putting in the presentation. So I think the main thing about sinks– I actually didn’t give a historical explanation for this. But Sphinx kind of was built by Python programmers and developers to Python projects. So it’s meant to be specialized for Python. But it’s not anymore. Some people outside Python are also using it. I’ve heard the Linux kernel are using Sphinx to document things as well. And so it is pretty powerful. And it supports the inclusion of your docstrings with an extension called autodoc. So the main idea is that you’ll have your module with your doc strings. And you can tell Sphinx, OK, now you need to extract those docstrings and put them in a generated HTML documentation. Let me just answer a question from the chat. Do the docstrings have to be in RST? Yes, they should use a syntax that is compatible with what you’re doing in Sphinx. There are extensions that allow you to write the docstrings in other formats, for example, in Markdown. So you can check that. If you prefer to use Markdown, for example, many people do. You can use Mist, I think, will do that, which is read the docstrings in other formats and output HTML anyway. So many of those things, for example, the extensions and how you want to extract things from your module are going to be selected in the configuration file, which is the conf.py file. And you can then document whole classes or even modules automatically using member options for the auto directives. So this is an example. You would have a module called I/O. And if you want to just document every class, every function inside this module, you just say members, which means document everything. OK. And just to mention a couple other extensions that might be useful, you can do doc tests. So I don’t know if this is something that you’re familiar with, but in the same way that you would test your code– for example, you do unit tests, and then you run your tests to see if your package is consistent, if everything’s working well– you can create doc tests, which are little snippets of code that you will put in your docstring. And you can then run those doc tests as you generate the documentation or in a separate comment like this one, which is make doc tests. And it will check if your documentation is generating the same outputs that you want to. So this is great to guarantee that your documentation stays up to date, even if you change your API or if you change the way you’re doing things in your package. If you have doc tests, you can guarantee that the documentation is still relevant to the current version of your package. There’s also another super cool extension called intersphinx, which is an extension that allows you to refer to other projects’ documentation labels easily by using their intersphinx mapping. So for example, I’m going to show it to you later quickly. But in the NumPy documentation, we refer to the SciPy documentation several times. Sometimes we mention something in MapLodlib. Sometimes we’ll mention something from Pandas. And because they share their intersphinx mapping openly, we can actually check which kinds of functions and modules and classes they have available. And you can refer to those elements in your own documentation such that the generated link will not be a link that you write yourself. But it’s going to be something provided by the other package. So what’s the advantage of this is that you don’t have to create those links manually yourself. So suppose that you are citing another project’s documentation, and there’s a new version, and the URL for that module or function changes. And then your reference is not valid anymore. If you’re using the intersphinx mappings, that doesn’t happen. Because the actual URL that you have to click to get to the documentation will be generated automatically by the intersphinx extension. So this is really useful if you’re citing other people and other projects. And I’ll try and show you an example later if we have the time. Then I’m going to show you an example of the NumPy docs, but I just want to go back a little because I want to just show you more about the minimal syncs repo. So I’m going to open a console here. I’m on Linux. So it’s not going to be the same for you if you’re in another operating system. But I just wanted to show you quickly some things that we can do with syncs. So what I have– this is the pokidex.py file that is in the minimal syncs repo. And it’s a very, very simple package, really, which only has some definitions of Pokemon and how you can call them which Pokemon they evolved to and stuff like that. So I just want to check out this repo. And I’m going to show you– I think I’m going to do it differently like this. So you can see better. There we go. OK. So what we have here is a directory structure like this. There’s my readme, the license file, and then there’s a source folder which has only one file in it, which is the one that’s open there, which is pokidex.py. And all the other things that are there are actually auto-generated and doesn’t matter for our purposes. So the main file is pokidex.py. And if we go to the docs folder, there is a bunch of .rst files. There’s a make.bat file, a make file, and some other stuff. So I just wanted to show you how you would generate the documentation. My machine’s name is Asoka, as in Asoka Tunnel, because I’m a big Star Wars fan. I’m just going to run the make HTML command here. So like I mentioned, we already have a conf.py, which is a configuration file. So we already started Sphinx in this repo. So we can run make HTML to generate our documents. So the results will be in build HTML. So you can actually access this with your browser. If I go to file– HTML. HTML. Oops. Oh, docs. Sorry. There we go. So I just went to the folder that it pointed me to. And then index.html should be my main file, which is the one that I showed you before. So I showed you index.rst, which is this one. And Sphinx generated this HTML. So it’s much nicer. It does have a talk tree with the contents that I told him to include. There’s an indices and tables. Some of those are irrelevant for our purposes here. But the idea is that you can use Sphinx to then generate this nice documentation that you can actually read. And then for example, I have my class starter Pokemon, which is like the parent class. I will generate another class for Bulbasaur, another one for Charmander. I don’t know how to pronounce those names in English. And I have another one for Squirtle. So I can check the docstrings are here, which are these big comments that I put into in the beginning of each class. And I’ll see what they generated with Sphinx. So I want to see what’s with class Charmander. So if I go here and I generate a documentation and I go to API documentation, there’s all the docstrings that I listed before. And they are here properly formatted. You can note that for Charmander, for example, I put a note. This was done with this directive, a note. And then I said, this is something you have to be careful with. And in my generated documentation, this becomes like a highlighted block. So there’s a bunch of different options. This is just one theme. Also, this is the default theme, but you can choose different themes if you want. This is actually a doc test. So if I go to the pokidex.py file, I’m going to show you how that’s done. Here it is. So there’s a function called Who is that Pokemon? And it will show the Pokemon’s name and its evolution. And I will say, hey, Sphinx, this is a doc test. So you have to make sure that those commands that are preceded by the big three greater than signs are actual commands that are going to be executed. And the output must be this one. So what I’m going to show you is that I’m going to make this wrong. So I’m going to delete all the evolutions of Bulbasaur and leave only Ivysaur. I’ll save my file. And I’ll come here and show you– if I do make doc test, it will say, whoops, test failed. I expected this because this is what’s listed in your doc test. But I actually got this output. So your doc test is wrong or your API is wrong. So you can check that. It’s a pretty nice feature if you have a big API and you want to make sure things are consistent between versions. This is all that I wanted to show you in the minimal Sphinx repo. But if you want to check that out, please do. There’s other little things that I did, like linking to other places. And you can check that out. [END PLAYBACK] OK. Should we be considering doc tests versus something like pytest? I think those are complementary things. That pytest means for testing your code and doc tests are for testing your documentation and how that’s done and your doc strings and things like that. I actually think you should be using both. I don’t know if pytest has ways of testing doc strings as well. I’m not sure. OK. This concludes the first part, which was mainly the explanation about Sphinx and the idea of how to use that to document your code. And then I just want to show you an example from the NumPy docs because I actually am pretty excited to get new contributors. I want to show you how you can do this. And I know that it can be a bit overwhelming to start because NumPy is such a big project. And you don’t know which steps you should take. We actually have a how to contribute to NumPy documentation page in our docs. You can check that out clicking this link. And I’m just going to show you here. We explain about the documentation team meetings and what you should know before you join. Or how you can contribute fixes and all that you need to know to contribute. So please check that out. And if you have feedback about that page, please let us know. It would be also super useful to be able to make that friendlier for people who want to join. So like I mentioned, we do have a documentation team. So if you’re unsure about how to contribute, you can always come and ask us for help. There will most certainly be people around to help you. And we’re happy to mentor people if they want to get started. So please just join. Don’t worry if English is not your first language. It’s not mine either. And we’ll make sure there are native speakers to make corrections or make sure things make sense. So you can contribute what you have. Start from where you are. And we can help you get there. There’s a couple ideas about how to contribute. You can open an issue requesting content that you think is missing. Maybe you lack an explanation or something. Or you would like to see a specific tutorial. You can just open an issue and suggest that that’s written. The more specific, the better. So if you can list, for example, the things that you would like to see in a document, that would be great. You can suggest video content as well. So we are looking into experimenting with that. So if you have suggestions for good video content that we could generate to help you or other people contribute or understand how NumPy works, let us know. And you can join our communications channel. So we mainly use Slack for our internal communication, meaning if you want to contribute and you want to know more and you want to get in touch with other people who are contributing to NumPy, you can just ask for an invitation for our Slack channel. We don’t vet people. You can just ask and we’ll add you. But it’s a nice place to ask questions privately if you prefer. If not, there’s GitHub and there’s our mailing list. So you can check our communications channel at numpy.org/contribute. OK, so I just want to finish with a pull request to the NumPy documentation. So I’m not actually going to do the pull request, but I’m going to show you how to get there. So I want to thank Fatma Trelasi because she’s the one who suggested this to me. There is a typo in our Quick Start guide. So I’m going to open this for you here. We have a Quick Start tutorial, which is like a high level explanation of things that you can do with NumPy, basic functions and methods that you can access, and things like that. And so if you go to Quick Start stacking arrays, which is one section of this Quick Start, there is a typo. So if you go here, I think I missed a link. The text is this one in general for arrays with more than two dimensions. H tags, tags along their second axis, V tags, tags along their first axis, and concatenate allows for n optional arguments, giving the number of the x’s along which the concatenation should happen. So I think this word should not be here. So I’m going to just search for concatenate allows so that we can find this quickly. Here we go. This is the offending word. So we would like to remove that. We’ll just look into the files and do that. However, because of things and how things are done, and actually because of– let me just clear my screen. Because of how NumPy is built and the documentations are built, so the documentation for NumPy includes those docs strings that I was mentioning before. This means that you have to have the whole source code for NumPy to be able to touch the docs. So we will actually do that now. So I’m just going to do the following. I’m going to split my screen so we can see the instructions on the left and the commands on the right. I hope that’s not too small. If it is, let me know, and I can maybe make that larger. So first, we have to look at the NumPy repository on GitHub. [TYPING] So if you go there, you can fork this repository to your own. So for example, you can go here and fork this repository to your own account. So I already have a fork, so I’m not doing this step. But this should be your first step in case you’ve never contributed to NumPy before. It’s the plural arguments that should be changed to singular. That’s very possible. We can take a look at that. For the purpose of this presentation, it really doesn’t matter. It’s just more of an example of how to find the file and compile things. But yeah, I think maybe we should check that. OK, so once you have NumPy forked into your own GitHub profile, you can just clone git@github.com your username numpy.git. This will create a folder called NumPy into your own directory structure. So if you’re on your home folder, it will create it inside your home folder. If you’re in a subfolder, it will create it there. You’ll find it later. So let’s see it cloning. Cool. Now, because our final intent is to submit a PR, it’s nice to add a remote called upstream to the main GitHub repo. And I’ll explain what this means in a bit. OK, I have never joined my own NumPy folder. I am on NumPy. Now I can add the remote. And I’ll show you what this means. I think this is too big. I’m going to try and make it like this. That’s better. So if you ask it for what’s happening now, you have two separate sources for your code. You have origin, which is your own fork. And then you have upstream, which is the main GitHub repo. This is useful if you want to sync your repo to the original NumPy repo later, or if you want to submit your pull request directly. It’s interesting to have those two remotes. So origin is yours, and upstream is NumPy. Now you can maybe check out a branch. So I’ll check out a branch called docfix. And this will be the branch that I’ll be working on to do the changes. If you’re not familiar with Git, maybe you can check out a Git tutorial. And this should be just a standard workflow for any PR. So there’s nothing different about NumPy here. So now is the time when we have to do our fix. Now that we have our developer environment set up, we can do our fix. So we can actually check for the file that we said we were going to change. So this is a Quick Start tutorial. And I don’t know if you can see here, because it’s very, very small. But it says numpy.org/doc/stable/user/quickstart. So that gives you an idea of where this file is. If you look inside the NumPy folder here that you have, that you have downloaded and cloned from GitHub, it will have a doc folder. Inside the doc folder, you can see a source, which is all the RST files that we mentioned before. And then because our document is under user, this should also be under user here. Now that I am inside the user folder, there’s a bunch of things, including my Quick Start document, which is here, quickstart.rsd. So I’m going to check this with Nano just because it’s easier. You can do this with the editor that you prefer. So under this file, Quick Start, there’s a bunch of things, including the phrase that we were mentioning we thought was not right. So it was under– is it shape manipulation? I think I can search for concatenate allows, just like we did before. Oh, no. Maybe it’s an inline break. Here you go. Concatenate allows for n optional arguments. Maybe we need to delete the s, like you mentioned, someone mentioned in the chat. So let’s do that for n optional argument, giving the number of the x’s along which the concatenation should happen. OK, so we did our fix. Let’s save this file. Let’s exit. And now I’m just going to go back to the roof folder because I want to show you some other stuff. Now that we are here, we would like to build the documentation and see if our fix actually happened and if everything’s going fine, right?
So we would have to set up our developer environment because we have never built NumPy before here. So what I’m going to do is create a virtual environment. You can use Conda if you want. It’s up to you what’s your preferred way of doing this. But it’s a good idea to use some kind of environment setup so you don’t have trouble with your operational system Python. So I’m going to activate my virtual environment. And I’m going to pip install with the file called doc requirements. So because NumPy is so big, we have two files, one called doc requirements and another called test requirements. And those two files are separate with separate packages that you should install with their version specified. So if you’re just going to build the documentation, you don’t necessarily have to build the test requirements only if you want to run the complete tests before submitting your PR, for example. We also have to install Sitefon because NumPy depends on it. So I’m going to install it now. We’re good to go. So now we should compile NumPy. So I am going to copy this. And you can actually use this command as it’s more convenient for you. So for example, for me, I have 12 cores in this machine. So to make it faster, I will use -j12. So while this is compiling, if you want to ask questions in the chat, I’m available to answering them. I think it’s going to be a couple of minutes when it compiles. Thanks, Ross, for being here. [AUDIO OUT] So yeah, I just want to mention if there’s no questions, and even if they are while you’re thinking about them, I want to mention that if you have questions or if you don’t know how to approach this and you want to help NumPy or any other project, there will be most certainly people available to help you, either doing reviews for the VR or answering questions in forums or communications channel for the project. So for NumPy, I can assure you that there are many people available to answer questions. And nobody expects you to know everything from the go. So if you’re interested in contributing, you can always come and ask questions and figure out how you can best contribute. Is Python core Python? I don’t know. I don’t think so. Obrigada, Stephanie. No, CPython is the Python implementation that most of us use, with an underlying C implementation. OK, so I just want to mention that my compilation stopped. So I have NumPy on my machine now, and I just have to build the docs. So to build the docs, I will go into the docs folder. There’s a bunch of things there. You can actually see a bunch of, for example, tests.rst.txt. You can open those files and look and see how they look like, if you want. But to compile, you just do the same command that I mentioned before, make HTML. So now we’re going to get something– OK, so because this is the first time that I was building, it asked me to make dist first before make HTML, just because I just rebuilt NumPy. So it was going to give me– yeah, this is something that we identified yesterday, and it was not supposed to happen here. But I’ll just fix it. Just give me a second. [TYPING] So this is not supposed to happen. I think this should work. [TYPING] OK, I’m going to try and fix this while you ask questions, if that’s OK. [TYPING] So you see, this is done by Ralph just this morning. Maybe I can actually fix this in a better way. [TYPING] Which is git fetch upstream. [TYPING] OK, so this should work, but I’ll have to rebuild. I’m sorry, folks. So we identified this yesterday, actually, which is an error in setup.py. So you can see the actual commit to fix this issue right here by Ralph. So we identified this yesterday, and I thought it was working already. So OK, we had already built this, so it doesn’t have much to do. So I’ll try and build the docs again. Yay, that works. So while it’s building, let’s go back to the questions, maybe. I’m usually– Charlie, you’re using Conda. That’s also a great idea. Me, personally, I prefer Conda, so I use that all the time. When presenting, I chose to use Virtual AMP because I realize that many people are mostly living in Virtual AMP and using pip. But if you want to use Conda, it should also work. Yes, that is the nature of live coding. And it’s also nice to see, for example, this error, I identified it the other day, and then I mentioned in our Slack, Ralph was patiently debugging with me for some time until we found the error and he fixed it. So that’s kind of how it goes when you find those errors. Can Sphinx be used for other languages? I think it can. It probably depends on extensions of some sort. But yes, I think it’s pretty flexible right now. I know that there are some extensions for JavaScript or Matlab. I know that there’s an extension for Matlab. Why did we build NumPy? OK, so like I’m doing now, make this. This means that you want to generate documentation for the newest NumPy version. And so what we did is we created a virtual end that didn’t have NumPy installed. We just cloned it. But then to build the documentation, you want to run the doc tests, right? So you want to actually be able to import NumPy. So you have to build NumPy and install it before you can build the docs so that the doc strings pass and NumPy can actually be imported. And I don’t think that’s a newbie question. I think that’s– I don’t agree with the newbie, like beginners, advanced, intermediate difference. I mention this all the time because we are usually– each of us has our own experience. And maybe we are very experienced in genome sequencing. But we don’t know how to do that in Python. Does that mean we’re like a beginner? I don’t think so. I think we’re maybe a newcomer to this package. But that doesn’t mean we don’t have other things that we know about, we know a lot about. So this takes a while. I wish we could accelerate that, but I don’t think we can. So it wouldn’t have worked if we would have installed it. It will usually complain about a different version like it did here. So I don’t think it would work. You have to have the source code to be able to build the docs. Yeah, I’m so sorry, folks. I don’t think that’s going to work. And it’s something that has been working before. Yeah, it will give me an error. And I think it’s just because we found this error recently. And it’s just bad luck that it’s not fixed right now. But it’s going to be soon, I promise. I don’t think I can extend this for too long. Otherwise, you can not follow, actually. So yeah, the final step is this one, which is building the HTML docs and then submitting your pull request once you are satisfied with your changes. I just wanted to leave a small note about Jupyter Notebooks. It’s going to be very, very short. It is possible to use Jupyter Notebooks inside your Sphinx generated documentation. We are actually doing something like this in our NumPy tutorials repo. So in this repository, you can find some IPython Jupyter Notebooks that have been converted using Sphinx to an HTML site. This is still in progress. So you can check that out. And it’s supposed to contain only tutorials. So thinking about the four quadrants of documentation, this will be only tutorials. So I had some final thoughts just mentioning about how challenging Sphinx can be. It’s not supposed to be as challenging as I’m getting it now, just because there’s a recent error in the building for the documentation. But it is sometimes challenging. So it’s nice to have someone next to you who can help you understand what’s going on. If you prefer using only Markdown, you can check out Myths, like I mentioned before. There are many other interesting extensions for Sphinx. And for your own project, if you’re using Sphinx to build the documentation, you can use Read the Docs, which is an excellent way to serve those documents in the web. There’s also a bunch of interesting resources about documentation from the Read the Docs folks. So those links are all clickable. And you can check them out in the slides later. That’s it. I’m so sorry why I didn’t manage to build the docs in the end. And this is something that I worked yesterday. So yeah, I’m sorry. But that’s the nature of live coding and the nature of open source projects as well, I have to say. Oh, Reshma, I think you muted. There I am. OK, yeah. I was going to say, yeah, you never know, because things are– it’s like this ecosystem is constantly changing. And something that works yesterday, things need to be– so yes, that is the nature of it. It was a great presentation.
01:06:15 Q&A: Why do we build from source?
I did have a question for you about– I think the question about why do we build from source? And let me know if I’m interpreting this right, because I was wondering the same thing with scikit-learn, where I have a little bit more experience, which is there’s the stable version of a docs, which is by release. And then there’s the dev version, which is the development version. And so when we’re building the virtual environment, we don’t want to use our latest released version of the library. We want to use the one in development. Right? Yes, yes. It’s like a lie. It’s like things have been fixed, and they’re constantly being fixed. And I think that’s why we’re doing it right.
Melissa:
Yes, exactly. And not only that, but also the idea of having the most recent changes to make sure you are not overriding someone else’s work, because that will happen if you don’t have the latest one. So this is why we keep our fork in sync, and we try to do that. Because of course, when you submit your changes, there will be continuous integration. There will be tests. There will be other ways of checking if you are not doing something that’s not following what the API says anymore or something like that. But yeah, you’re supposed to be working with the latest development version. Yeah, because one time I had actually opened up an issue, submitted a PR for a documentation problem. And I was looking at the stable version, and it had already been fixed in the development version. Once I understood that, I was like, oh, now I understand. Like I wasn’t looking at the final docs in the development version. So it was a good learning point for me as well. So I’m always amazed. Every time I work on– even if it’s like a small documentation fix, which is adding a line how much I learn about the whole process from one fairly simple PR. Yeah, absolutely. And I think it’s pretty amazing to work in a project like this one. NumPy is a huge project and an important one as well, which is sometimes– can be intimidating, right? Because you’re doing big changes, and you don’t know if that’s going to work. So it is interesting that you can find ways to contribute in a low emotional cost, low impact things first, if that’s something that concerns you. So it is nice to be able to touch the documentation and see how that goes, see the workflow, see how you interact with people, how the community behaves, and all that. So that’s interesting. That’s great. It would be cool to do something with NumPy that’s hands-on for some of our members. Yeah, I think that this is something that we can talk about. I’d be open to it. I think other people would as well. Yeah, it’s cool. I love seeing some of the parallels with Scikit-learn and some of the things that are done a bit differently. So yeah, it was a great learning experience. So I don’t see any more questions. But if anybody else does have any more questions, now is the time to post it in the chat. And just for people to know, I’m going to have the video up. I try to have it up within 24 hours on YouTube. I will take a lot of the links from the chat and put them in a transcript so people can easily access the links that have been posted in the chats as well. And yeah, I think that’s all that I have to say. Melissa, if you have anything else to say, thank you so much for doing this presentation. Yeah, I just want to apologize again for the build problem. But it should be fixed soon. Other than that, it was a great pleasure. And if you ever want me to come back, I’d be happy to. Absolutely. Thank you.
OK, now that we have done our fix, we can build the docs and see how that worked. So we can do make HTML inside the NumPy/doc folder and see what happens. So this will use Sphinx to build all the files. And this might take a while exactly because it’s reading all of the sources, including the docstrings, including the auto-generated documentation, and the extra documents that we have written in RST format. So now that it’s finished reading sources, it’s getting to the end, actually. So it’s executing all the comments that we have inside our documentation. For example, for this tutorial SVD that you were looking at there, there is an image manipulation aspect. And so this clipping input data message that you’re seeing is actually coming from the code inside our documentation. So Sphinx is building everything, executing all the commands that are listed there, and now writing the output to HTML. So you can see that this really takes a while. And once it’s all built, all the HTML is going to be generated inside the build folder, in the doc folder, and the root numpy folder. So we can go to our browser, in my case, home, Melissa, numpy, doc, and then you can see all the directory structure under there. So we’ll go to build HTML. Remember that Quick Start, the file that we changed, was under user. So we can go to userquickstart.html, and we’ll see our fix, fortunately. So if we go to concatenate allows, it will save for an optional argument, giving the number of the x’s along which the concatenation should happen. So now our fix is done, and we can submit our PR if we want to.