If you are a Python developer and eager to take your initial steps into the data analysis, then this book is surely for you. Throughout this book, you will be able to get knowledge about different ways to analyze data using NumPy and pandas.
Firstly, setting Up a Python Data Analysis Environment; Secondly, discusses installing Anaconda and managing it. Anaconda is a software package that will be used in this book. Furthermore, Diving into NumPY; discusses NumPy data types controlled by dtype objects; which are the way NumPy stores and manages data. Operations on NumPy Arrays, will cover what every NumPy user should know about array slicing, arithmetic, linear algebra with arrays, and employing array methods and functions.
Introduces pandas and looks at what it does. We explore pandas series, DataFrames, and creating them. Arithmetic, Function Application, and Mapping with pandas, revisits some topics discussed previously, regarding applying functions in arithmetic to a multivariate object and handling missing data in pandas. Managing, Indexing, and Plotting, looks at sorting and ranking. We’ll see how to achieve this in pandas, looking at hierarchical indexing and plotting with pandas.
There are a lot of text conventions used within this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. In this, we’ll discuss installing Anaconda and managing it. Anaconda is a software package we will use in the following chapters of this book.
In this book, we’ll focus on the portion of Anaconda devoted to Python. Anaconda helps us use these languages for data analysis applications, including large-scale data processing, predictive analytics, and scientific and statistical computing. Continuum Analytics provides enterprise support for Anaconda, including versions that help teams collaborate and boost the performance of their systems, along with providing a means for deploying models developed using Anaconda. Thus, Anaconda appears in enterprise settings, and aspiring analysts should be familiar with its use. Many of the packages used in this book, including Jupyter, NumPy, pandas, and many others common in data analysis, are included with Anaconda.
We will be exploring Jupyter Notebooks, the primary tool with which we will do data analysis with Python. We will see what Jupyter Notebooks are, and we will also talk about Markdown, which is what we use to create formatted text in Jupyter Notebooks. In a Jupyter Notebook, there are two types of blocks. There are blocks of Python code that are executable, and then there are formatted, human-readable text blocks.
Spyder is an IDE unlike the Jupyter Notebook or the Jupyter QT Console. It integrates NumPy, SciPy, Matplotlib, and IPython. It is extensible with plugins, and it is included with Anaconda.
Rodeo is a Python IDE developed by Yhat, and is intended for data analysis applications exclusively. It is intended to emulate the RStudio IDE, which is popular among R users, and it can be downloaded from Rodeo’s website.
Conda allows us to create and manage multiple environments, allowing multiple versions of Python, R, and their relevant packages to exist. This can be very useful if you need to develop for different systems with different versions of Python and their packages. Conda allows you to manage Python and R versions, and it also facilitates installation and management of packages.
As mentioned earlier, Anaconda allows you to manage multiple versions of Python. It is possible to search and see which versions of Python are available for installation. You can verify which version of Python is in an environment, and you can even create environments for Python 2.7. You can also update the version of Python that is in a current environment.