“Visualizing information can give us a very quick solution to problems. We can get clarity or the answer to a simple problem very quickly.” — David McCandless
可视化信息可以为我们提供快速解决问题的方法。 我们可以很快获得清晰性或简单问题的答案。” -大卫·麦坎德利斯
Visualizing data is one of the essential steps in any data science project. It makes it easier to find patterns, detect anomalies, and communicate your results efficiently.
可视化数据是任何数据科学项目中的基本步骤之一。 它使查找模式,检测异常以及有效地传达结果变得更加容易。
The process of visualizing data, however, can be a bit tricky. Today, there are so many plotting tools and libraries that we can use to bring our data to life through charts and colors. Some tools are quite extravagant and expensive.
但是,可视化数据的过程可能会有些棘手。 如今,有太多的绘图工具和库可用于通过图表和颜色使数据栩栩如生。 有些工具非常奢侈和昂贵。
So, how can one decide what to use?
那么, 如何决定使用什么呢?
Well, this article will — hopefully — help you answer that question.
好吧,本文有望(希望)能帮助您回答该问题。
In this article, we will cover the top 10 plotting libraries in Python; we will go through some usage examples and how to choose one of them for your next visualization adventure.
在本文中,我们将介绍Python中排名前10位的绘图库; 我们将介绍一些使用示例,以及如何为您下一次可视化冒险选择其中一个示例。
But,
但,
Before we get into that, let’s first talk about the two types of plots we can generate.
在开始讨论之前,让我们先讨论一下可以生成的两种图。
静态与动态绘图 (Static vs. dynamic plotting)
When plotting any information, we have options to choose from; we can either generate a static plot or a dynamic one.
绘制任何信息时,我们都有可供选择的选项; 我们可以生成静态图或动态图。
静态绘图 (Static plotting)
Static plots contain graphs displaying constant relations between two or more variables. That is, once the plot is created, it can’t be changed by the user. In static plots, the users can’t change any aspects of the plot.
静态图包含显示两个或多个变量之间恒定关系的图。 也就是说,创建图后,用户将无法更改它。 在静态绘图中,用户无法更改绘图的任何方面。
动态绘图 (Dynamic plotting)
Dynamic plots — also known as interactive plots — are used when the developer/ designer wants the users to interact with the plot, changing some aspects of it and getting more familiar with the data used to create it.
当开发人员/设计人员希望用户与图进行交互,更改图的某些方面并更加熟悉用于创建图的数据时,将使用动态图(也称为交互式图)。
如何选择? (How to choose?)
Okay, so you have some data that you want to visualize, but you don’t know where to start. Let me help you out. Whenever I start a project and need to create some visualizations, I often ask myself 4 questions that lead me to the right choice.
好的,所以您有一些要可视化的数据,但是您不知道从哪里开始。 我帮你 每当我开始一个项目并需要创建一些可视化对象时,我经常问自己4个问题,这些问题使我做出正确的选择。
问题1:我的目标平台/媒体是什么? (Q1: What is my target platform/media?)
The first thing you need to decide is which plot type do you need, static or dynamic? Usually, static plots are used in printouts, technical papers, or reports. In this case, you need to tell your audience something and not to interact with the plot itself.
您需要确定的第一件事是您需要哪种图类型,静态还是动态? 通常,静态图用于打印输出,技术论文或报告中。 在这种情况下,您需要告诉听众一些事情,而不要与剧情本身互动。
However, if you’re using the plot in an online tutorial, in a class, or any web application, where the users can play around with the data to understand it better or to use it elsewhere, then you should create a dynamic plot.
但是,如果您在在线教程,课堂或任何Web应用程序中使用该图,用户可以在其中玩转数据以更好地理解它或在其他地方使用它,则应创建一个动态图。
问题2:我的数据可以公开获得吗? (Q2: Is my data publically available?)
This is a crucial thing to consider. If your data is private and is not publically available, then you need to use static plots. But, if the data is stored on public service that doesn’t require special permission to access, then a dynamic plot may be a better option.
这是必须考虑的关键。 如果您的数据是私有数据,并且不能公开获得,那么您需要使用静态图。 但是,如果数据存储在不需要特殊访问权限的公共服务上,那么动态绘图可能是一个更好的选择。
问题3:我的工作重点是什么? (Q3: What is my priority?)
Once I have made my decision to use static vs. dynamic plotting. I ask myself, what is the priority of my visualization? Do I need it to be complicated with many layers? Answering this question helps choose the correct library to use.
一旦决定使用静态还是动态绘图。 我问自己,可视化的重点是什么? 我需要将其复杂化吗? 回答此问题有助于选择要使用的正确库。
Q4:我需要一种特殊的可视化吗? (Q4: Do I need a special kind of visualization?)
Finally, I ask myself what kind of plotting do I need? Is it a simple chart? Bar, column, pie, or donut? Or do I need to plot something more specialized such as a network or a map?
最后,我问自己我需要什么样的绘图? 它是一个简单的图表吗? 酒吧,圆柱,派或甜甜圈? 还是我需要绘制一些更专业的东西,例如网络或地图?
Suppose I need to visualize general information, then using any library that offers my desired chart type. However, if I need to create a map or a network, that will limit my options and help me make the decision faster.
假设我需要可视化常规信息,然后使用提供所需图表类型的任何库。 但是,如果我需要创建地图或网络,那将限制我的选择并帮助我更快地做出决定。
十大Python绘图库 (Top 10 Python Plotting libraries)
Python is one of the most used programming languages in data science and many other applications. However, due to its popularity, Python has so many data visualization libraries to choose from. The wide variety of options is both a good and a bad thing.
Python是数据科学和许多其他应用程序中使用最广泛的编程语言之一。 但是,由于其受欢迎程度,Python有许多数据可视化库可供选择。 各种各样的选择既是好事,也是坏事。
Having many options mean you can choose the library the matches your targets entirely, but it can be too confusing to new people joining the field and to experts deciding what to choose.
有很多选择,意味着您可以选择完全符合您的目标的库,但是对于加入该领域的新人和决定选择什么的专家来说,这可能会造成混淆。
Here, I will go through the top 10 Python libraries out there, how and when to use them. I divide those libraries into two categories, libraries used to plot static charts, and those used for dynamic graphs.
在这里,我将详细介绍十大Python库,以及如何以及何时使用它们。 我将这些库分为两类,用于绘制静态图表的库和用于动态图形的库。
Let’s get visualizing…
让我们进行可视化…
静态绘图库 (Static plotting libraries)
Matplotlib (Matplotlib)
We can’t talk about data visualization in Python without mentioning the first and oldest Python visualization library of them all, Matplotlib. Matplotlib is an opensource library that was created back in 2003 with a syntax close to MATLAB. Since then, the library has gained a lot of love and support than continues to this day.
我们不能不谈论Python中的数据可视化,而无需提及它们中第一个也是最古老的Python可视化库Matplotlib 。 Matplotlib是一个开放源代码库,创建于2003年,语法类似于MATLAB。 从那时起,图书馆获得了热烈的支持。
Many Python packages are built upon Matplotlib core. For example, Seaborn and Pandas act as wrappers around Matplotlib, allowing the user to create graphs with fewer lines of code.
许多Python软件包都是基于Matplotlib核心构建的。 例如,Seaborn和Pandas充当Matplotlib的包装,使用户可以用更少的代码行创建图形。
When to use Matplotlib?
何时使用Matplotlib?
- If you’re familiar with MATLAB, using Matplotlib will look familiar and will make your transition easier. 如果您熟悉MATLAB,则使用Matplotlib会看起来很熟悉,并使您的转换更加容易。
- If most of your data is time-series, then using Matplotlib will make it a bit complicated to use and plot. 如果您的大多数数据都是时间序列的,那么使用Matplotlib将使它的使用和绘制变得有些复杂。
- Matplotlib is wey powerful in dealing with static 2D plots. However, it gets quite complicated if you want to plot 3D or interactive visualizations. Matplotlib在处理静态2D绘图方面非常强大。 但是,如果要绘制3D或交互式可视化效果,它将变得非常复杂。
- Matplotlib is a very low-level library, which means that one needs to write more code to get the visualization working. Matplotlib是一个非常底层的库,这意味着需要编写更多代码才能使可视化工作。
- Matplotlib was not designed for data exploration purposes, so if your main goal is to do that, you might be better off using another library. Matplotlib并非出于数据探索目的而设计,因此,如果您的主要目标是这样做,那么最好使用其他库。
Usage Example
使用范例
#Import needed libraries
import numpy as np
import matplotlib.pyplot as plt
#Add this if you're using Jupyter Notebook
%matplotlib inline
#Generate data
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
#Plot scatter plot
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

Seaborn (Seaborn)
Seaborn is one of the libraries build upon Matplotlib. It acts as a wrapper to provide users with a high-level alternative to Matplotlib. You can create the same visualization as Matplotlib but with much fewer lines of code.
Seaborn是基于Matplotlib构建的库之一。 它充当包装器,为用户提供Matplotlib的高级替代方案。 您可以创建与Matplotlib相同的可视化文件,但只需更少的代码行。
Since Seaborn is built upon Matplotlib, it contains the same charts type as Matplotlib in addition to some cool charts such as Heatmaps and Violin charts. Seaborn can also be used to give Matplotlib charts more visually appealing.
由于Seaborn是基于Matplotlib构建的,因此它除了包含一些很酷的图表(如Heatmaps和Violin图表)外,还包含与Matplotlib相同的图表类型。 Seaborn还可以用于使Matplotlib图表更具视觉吸引力。
When to use Seaborn?
何时使用Seaborn?
- I always recommend that if you’re using Matplolib, you should Seaborn with it to make your visualizations better. 我始终建议,如果您使用的是Matplolib,则应该使用Seaborn,以使您的可视化效果更好。
- If you’re starting with Python and DS, Seaborn is an easy and straightforward library that you can use to create stunning charts with less to no effort. 如果您从Python和DS入手,Seaborn是一个简单明了的库,您可以使用它轻松创建令人惊叹的图表。
- Seaborn offers easy customization methods to add your touch to your graphics. It gives you complete control over the color palette of the created graphs. Seaborn提供了简便的自定义方法,可将您的触摸效果添加到图形中。 它使您可以完全控制所创建图形的调色板。
Seaborn has many statistically-minded built-in plots that you can use easily, such as Facet plots and regression plots.
Usage Example
使用范例
#Import needed libraries
import numpy as np
import seaborn as sns
#Add this if you're using Jupyter Notebook
%matplotlib inline
#Generate data
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
#Plot scatter plot
sns.scatterplot(x, y, hue=colors, size=colors, legend=False)

lot宁 (Plotnine)
Back in 1993, a fantastic book was published. The Grammar of Graphics presented a layered rule guide for designers and data scientists to create beautiful, meaningful, and useful data visualizations.
早在1993年,一本奇妙的书就出版了。 图形语法为设计师和数据科学家提供了分层的规则指南,以创建美观,有意义和有用的数据可视化。
If you used R before, Plotnine was built with similar syntax and as an implementation of the different aspects presented in the Grammar of Graphics book, and it’s based on the popular R library ggplot.
如果您以前使用过R,则Plotnine是用类似的语法构建的,并且是对“图形语法”书中介绍的各个方面的实现,并且它基于流行的R库ggplot 。
When to use Plotnine?
何时使用Plotnine?
- The most straightforward reason to use Plotnine is if you’re transitioning from R to Python and want to create visualization without much hassle. 使用Plotnine的最直接的原因是,如果您正在从R过渡到Python,并且想要创建可视化过程而没有太多麻烦。
- Plotnine allows the user to easily compose plots by explicitly mapping data to the visual objects forming the plot. Plotnine允许用户通过将数据显式映射到形成图的可视对象上来轻松组成图。
- Plotnine API allows you to create different types of charts easily and with few lines of code without the need to go back to the documentation often. 通过Plotnine API,您可以轻松地以几行代码创建不同类型的图表,而无需经常返回文档。
- Plotting with Plotnine is powerful as it makes custom plots easy to think about and create. 使用Plotnine进行绘图功能强大,因为它使自定义绘图易于考虑和创建。
Usage Example
使用范例
#Import needed libraries
import numpy as np
from plotnine import *
from plotnine.data import *
#Add this if you're using Jupyter Notebook
%matplotlib inline
#Generate data
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
#Plot scatter plot
p = ggplot(aes(x=x, y=y))
p + geom_point(aes(color=colors))

网络X (NetworkX)
NetworkX is a Python library that is not solely for visualization. It is instead a package created to analyze, manipulate, and study the structure of complex networks.
NetworkX是一个Python库,不仅用于可视化。 相反,它是一个用于分析,操纵和研究复杂网络结构的软件包。
NetworkX is one of those libraries that are field-specific or area-specific; that is, you can’t generate any charts using this library. For example, you can’t create a bar or pie chart using NetworkX.
NetworkX是那些特定于字段或特定于区域的库之一; 也就是说,您无法使用此库生成任何图表。 例如,您不能使用NetworkX创建条形图或饼图。
When to use NetworkX?
什么时候使用NetworkX?
- If you’re dealing with graphs or graph theory algorithms, using NetworkX will allow you to implement and analyze these applications quickly. 如果您要处理图形或图形理论算法,则使用NetworkX将使您能够快速实现和分析这些应用程序。
- If you’re trying to study the relationship between different data points. 如果您要研究不同数据点之间的关系。
- If you’re trying to simulate and analyze the performance of entire networks. 如果您要模拟和分析整个网络的性能。
Usage Example
使用范例
#Import needed libraries
import networkx as nx
import matplotlib.animation as animation
import matplotlib.pyplot as plt
import random
# Graph initialization
G = nx.Graph()
G.add_nodes_from([1, 2, 3, 4, 5, 6, 7, 8, 9])
G.add_edges_from([(1,2), (3,4), (2,5), (4,5), (6,7), (8,9), (4,7), (1,7), (3,5), (2,7), (5,8), (2,9), (5,7)])
#draw network
colors = ['r', 'b', 'g', 'y', 'w', 'm']
nx.draw_circular(G, node_color=[random.choice(colors) for j in range(9)])

Missingno (Missingno)
Whenever you start a new data science project, you will need to perform some data exploration to understand your data better. One very annoying thing that often happens is coming across missing data entries. As a data scientist, missing data entries is one of the most challenging tasks in the entire project.
每当您开始一个新的数据科学项目时,就需要进行一些数据探索以更好地了解您的数据。 经常发生的一件非常烦人的事情是丢失数据条目。 作为数据科学家,丢失数据条目是整个项目中最具挑战性的任务之一。
Well, Missingno is here to the rescue. It allows the user to test the dataset for missing entry by providing a visual summary of the dataset. So, instead of going through rows and rows of numbers, you can filter and sort the data based on completion and correlation between variables.
好吧, 密西根诺在这里进行救援。 它允许用户通过提供数据集的可视摘要来测试数据集是否缺少条目。 因此,您无需遍历数字的行和行,而是可以基于变量的完成度和相关性对数据进行过滤和排序。
When to use Missingno?
什么时候使用Missingno?
- If you want to speed and ease up your data exploration phase of any project. 如果您想加快和简化任何项目的数据探索阶段。
- Displays a count of values present per column, Matrix, Heatmap, and Dendrogram. 显示每列,矩阵,热图和树状图显示的值计数。
Usage Example
使用范例
Dataset used is here.
使用的数据集在这里 。
#Import needed libraries
import pandas as pd
import missingno as msno
# Load the dataset
df = pd.read_csv("Documents/kamyr-digester.csv")
# Visualize missing values
msno.matrix(df)

动态绘图库 (Dynamic plotting libraries)
密谋 (Plotly)
Plotly is a Javascript-based module to generate and manipulate interactive visualizations. With Plotly, you can create unique charts like dendrograms, 3D charts, and contour plots. Most of these charts types you cannot generate through most of the other libraries.
Plotly是一个基于Javascript的模块,用于生成和操纵交互式可视化。 使用Plotly,您可以创建独特的图表,例如树状图,3D图表和轮廓图。 您无法通过大多数其他库生成这些图表类型中的大多数。
Moreover, Plotly has many built-in applications for machine learning and data science, which makes it easier to implement and visualize standard algorithms such as ML regression and kNN classifications.
此外,Plotly具有许多用于机器学习和数据科学的内置应用程序,这使得更容易实现和可视化标准算法,例如ML回归和kNN分类。
When to use Plotly?
何时使用Plotly?
- If you want to start with creating interactive data visualizations in Python, then Plotly is the way to go. It allows you to create custom charts without any hassle. 如果要开始使用Python创建交互式数据可视化,那么Plotly是可行的方法。 它使您可以轻松创建自定义图表。
- You can create stunning animations in Plotly that helps you communicate your data better. 您可以在Plotly中创建惊人的动画,以帮助您更好地交流数据。
- If you want to create beautiful maps, scientific graphs, 3D charts, or financial ones. 如果要创建精美的地图,科学图形,3D图表或财务地图。
- Plotly allows you to create custom controls to your charts to give more interactive functionalities. 通过Plotly,您可以为图表创建自定义控件,以提供更多的交互功能。
Usage Example
使用范例
#Import needed libraries
import numpy as np
import plotly.express as px
#Add this if you're using Jupyter Notebook
%matplotlib inline
#Generate data
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
#Plot scatter plot
fig = px.scatter(x, y, color=colors, size=colors)
fig.show()

散景 (Bokeh)
Similar to Plotly, Bokeh is a Javascript-based package that allows you to create stunning interactive visualizations. Besides, similar to Plotnine, Bokeh is an implementation of the rules presented in the Grammar of Graphics.
与Plotly相似,Bokeh是一个基于Javascript的程序包,使您可以创建出色的交互式可视化。 此外,与Plotnine相似,Bokeh是图形语法中提出的规则的实现。
Bokeh provides three levels of control to accommodate different user types. The highest level allows you to create standard charts, such as bar, pie, scatter, and so on. The middle level offers some level of specificity as Matplotlib and allows you to control the basic building blocks of each chart. Finally, the lowest level gives you full control of every element of the chart.
散景提供三种控制级别,以适应不同的用户类型。 最高级别允许您创建标准图表,例如条形图,饼图,散点图等。 中间级别提供与Matplotlib相似的特定级别,并允许您控制每个图表的基本构建基块。 最后,最低级别可让您完全控制图表的每个元素。
When to use Bokeh?
何时使用Bokeh?
- Create nice interactive visualizations. 创建良好的交互式可视化。
- If you want to perform data transformations, such as adding jitter to crowded plots. 如果要执行数据转换,例如在拥挤的绘图中添加抖动。
- If you want to create beautiful 2D graphics. However, if you want 3D graphics, go with Plotly. 如果要创建漂亮的2D图形。 但是,如果需要3D图形,请选择Plotly。
Usage Example
使用范例
#Import needed libraries
from bokeh.plotting import figure, output_file, show
#Genrate data
N = 5000
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = [
"#%02x%02x%02x" % (int(r), int(g), 150) for r, g in zip(50+2*x, 30+2*y)
]
#Add tools to the interactive plot
TOOLS="hover,crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select,"
#Create figure
p = figure(tools=TOOLS)
#Create plot
p.scatter(x, y, radius=radii,
fill_color=colors, fill_alpha=0.6,
line_color=None)
output_file("color_scatter.html", title="color_scatter.py example")
#Show plot in a new tab
show(p)

闪闪发光 (Gleam)
Gleam is a Python library that is inspired by the R Shiny library. It is built to allow Python developers to create interactive data visualization for the web.
Gleam是受R Shiny库启发的Python库。 它的构建允许Python开发人员为Web创建交互式数据可视化。
Gleam puts it all together and creates a web interface that lets anyone play with your data in real-time, making it easier than ever to help others understand and interpret your data.
Gleam将所有内容放在一起,并创建了一个Web界面,任何人都可以实时处理您的数据,从而比以往任何时候都更容易帮助他人理解和解释您的数据。
When to use Gleam?
何时使用Gleam?
- If you want to create visualizations for the web and don’t want to deal with JS, HTML, or CSS. 如果您想为网络创建可视化并且不想使用JS,HTML或CSS。
- If you want to give your users real-time control over your data. 如果您想让您的用户实时控制您的数据。
阿尔蒂亚尔 (Altiar)
Altair is a simple, user-friendly, and consistent statistical visualization python library based on Vega-Lite. Altair allows you to create meaningful, elegant, and useful visualizations fast with just a few lines of code.
Altair是一个基于Vega-Lite的简单,易于使用且一致的统计可视化python库。 Altair使您仅需几行代码即可快速创建有意义,优雅且有用的可视化文件。
When to use Altier?
什么时候使用Altier?
- If you want hassle-free interactive data visualization. 如果您希望轻松进行交互式数据可视化。
- Apply transformations on your data quickly and effectively. 快速,有效地对数据进行转换。
- If you want to create declarative statistical visualizations. 如果要创建声明性统计可视化。
- Create stacked, layered, faceted, and repeated charts. 创建堆叠,分层,多面和重复的图表。
Usage Example
使用范例
#Import needed libraries
import altair as alt
from vega_datasets import data
#get dataset
seattle_weather = data.seattle_weather()
#Create chart
alt.Chart(seattle_weather).mark_point().encode(
x='temp_max',
y='temp_min',
)

大叶 (Folium)
Folium is a beautiful Python geovisualization library used for plotting maps. Folium uses the mapping abilities of the Leaflet.js enabling interactive map visualizations.
Folium是一个漂亮的Python地理可视化库,用于绘制地图。 Folium使用Leaflet.js的映射功能实现交互式地图可视化。
Folium gives you the ability to zoom in and out on your maps, click and drag them, or even add markers on them.
Folium使您能够放大和缩小地图,单击并拖动它们,甚至在其上添加标记。
When to use Folium?
何时使用Folium?
- If you want to create interactive maps, Folium is your best choice. 如果您想创建交互式地图,Folium是您的最佳选择。
Usage Example
使用范例
#Import needed library
import folium
#Create map object
m = folium.Map(location=[45.5236, -122.6750])
#Show map
m

结论 (Conclusion)
Data visualization is the way a developer or a data scientist communicate their data to a broad audience. Building better, effective data visualization is a valuable skill that every data scientist must work on developing.
数据可视化是开发人员或数据科学家将其数据传达给广泛受众的方式。 建立更好,有效的数据可视化是每位数据科学家都必须致力于开发的一项宝贵技能。
Whenever you want to create some visualizations, here’s a rule of thumb to follow, if you’re new to data science and Python and only want to create static charts, go with Seaborn. For network analysis, use NetworkX. If you want to create interactive visualization to present, use Plotly, but if you want to use this visualization for the web, then go with Gleam. Finally, if you want to create interactive maps, Folium is your friend.
每当您想创建一些可视化内容时,都需要遵循一条经验法则,如果您是数据科学和Python的新手,并且只想创建静态图表,请使用Seaborn 。 对于网络分析,请使用NetworkX 。 如果要创建呈现的交互式可视化,请使用Plotly ,但如果要在网络上使用此可视化,请使用Gleam 。 最后,如果您想创建交互式地图, Folium是您的朋友。
In the end, you can create compelling visualizations no matter what library you choose. And remember, complex is not always the answer. Always go with the library the provides the features you want fro your visualizations.
最后,无论选择哪种库,都可以创建引人注目的可视化。 请记住,复杂并不总是答案。 始终与库一起使用,它提供了可视化所需的功能。
所有评论(0)