猿问

使用seaborn散点图截断Y轴值

我在绘制 Y 轴值范围从 1 到 20+ 百万的大型 CSV 文件时遇到问题。我现在面临两个问题。

  1. Y 轴没有显示它应该显示的所有值。使用原始数据时,最多显示600万条,而不是显示全部数据最多2000万条。在我下面放置的示例数据(较小的数据)中,它仅显示第一个 Y 轴值,不显示任何其他值。

  2. 在标签部分中,由于我使用了色调和样式=名称,因此“名称”显示为标签标题和内部项目。

问题:

  1. 谁能给我一个示例或帮助我回答如何显示所有 Y 轴值?我该如何修复它以便所有 Y 值都显示出来?

  2. 如何在不删除散点的形状和颜色的情况下删除标签部分下的“名称”?

(请让我知道是否存在任何来源,或者这个问题在其他帖子上得到了回答,但没有将其标记为重复。如果我有任何需要解决的语法/拼写问题,也请告诉我。谢谢!)

您可以在下面找到我用来绘制图表和示例数据的函数。

def test_graph (file_name):


    data_file = pd.read_csv(file_name, header=None, error_bad_lines=False, delimiter="|", index_col = False, dtype='unicode')

    data_file.rename(columns={0: 'name',

                              1: 'date',

                              2: 'name3',

                              3: 'name4',

                              4: 'name5',

                              5: 'ID',

                              6: 'counter'}, inplace=True)


    data_file.date = pd.to_datetime(data_file['date'], unit='s')

    

    norm = plt.Normalize(1,4)

    cmap = plt.cm.tab10


    df = pd.DataFrame(data_file)

 

    # Below creates and returns a dictionary of category-point combinations,

    # by cycling over the marker points specified.   

    points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']

    mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)

    markers = {key:value for (key, value)

               in zip(df['name'], points * mult)} ; markers

   

    sc = sns.scatterplot(data = df, x=df['date'], y=df['counter'], hue = df['name'], style = df['name'], markers = markers, s=50)

    ax.set_autoscaley_on(True)             

    

    ax.set_title("TEST", size = 12, zorder=0)      

            

    plt.legend(title="Names", loc='center left', shadow=True, edgecolor = 'grey', handletextpad = 0.1, bbox_to_anchor=(1, 0.5))             

               


波斯汪
浏览 114回答 1
1回答

白衣染霜花

首先,对您的帖子进行一些改进:您缺少导入语句import pandas as pdimport matplotlib.pyplot as pltfrom matplotlib import tickerimport seaborn as sns线路df = pd.DataFrame(data_file)不是必需的,因为data_file已经是一个 DataFrame 了。线条points = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']mult = len(df['name']) // len(points) + (len(df['name']) % len(points) > 0)markers = {key:value for (key, value)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;in zip(df['name'], points * mult)}不要points像您期望的那样循环,也许可以按照此处的itertools建议使用。另外,设置 yticks 像ax.yaxis.set_major_locator(ticker.MultipleLocator(100))如果您的数据范围为 0 到 2000 万,则每 100 可能太多,请考虑将 100 替换为 1000000。我能够重现你的第一个问题。使用df.dtypes我发现该列counter存储为 type object。添加行df['counter']=df['counter'].astype(int)为我解决了你的第一个问题。不过,我无法重现您的第二个问题。对于我来说,结果图是这样的:您是否尝试将所有软件包更新到最新版本?编辑:作为您评论的后续内容,您还可以通过替换 1 来调整图中的 xticks 数量ax.xaxis.set_major_locator(ticker.MultipleLocator(1))更高的数字,比如 10。结合我的所有建议并删除看似不必要的函数定义,我的代码版本如下所示:import pandas as pdimport matplotlib.pyplot as pltfrom matplotlib import tickerimport seaborn as snsimport itertoolsfig = plt.figure()ax&nbsp; = fig.add_subplot()df = pd.read_csv(&nbsp; &nbsp; 'data.csv',&nbsp; &nbsp; header&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = None,&nbsp; &nbsp; error_bad_lines = False,&nbsp; &nbsp; delimiter&nbsp; &nbsp; &nbsp; &nbsp;= "|",&nbsp; &nbsp; index_col&nbsp; &nbsp; &nbsp; &nbsp;= False,&nbsp; &nbsp; dtype&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 'unicode')df.rename(columns={0: 'name',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1: 'date',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2: 'name3',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3: 'name4',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4: 'name5',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5: 'ID',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6: 'counter'}, inplace=True)df.date = pd.to_datetime(df['date'], unit='s')df['counter'] = df['counter'].astype(int)points&nbsp; = ['o', 'v', '^', '<', '>', '8', 's', 'p', 'H', 'D', 'd', 'P', 'X']markers = itertools.cycle(points)&nbsp;markers = list(itertools.islice(markers, len(df['name'].unique())))sc = sns.scatterplot(&nbsp; &nbsp; data&nbsp; &nbsp; = df,&nbsp; &nbsp; x&nbsp; &nbsp; &nbsp; &nbsp;= 'date',&nbsp; &nbsp; y&nbsp; &nbsp; &nbsp; &nbsp;= 'counter',&nbsp; &nbsp; hue&nbsp; &nbsp; &nbsp;= 'name',&nbsp; &nbsp; style&nbsp; &nbsp;= 'name',&nbsp; &nbsp; markers = markers,&nbsp; &nbsp; s&nbsp; &nbsp; &nbsp; &nbsp;= 50)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ax.set_title("TEST", size = 12, zorder=0)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ax.legend(&nbsp; &nbsp; title&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = "Names",&nbsp; &nbsp; loc&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 'center left',&nbsp; &nbsp; shadow&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= True,&nbsp; &nbsp; edgecolor&nbsp; &nbsp; &nbsp; = 'grey',&nbsp; &nbsp; handletextpad&nbsp; = 0.1,&nbsp; &nbsp; bbox_to_anchor = (1, 0.5))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ax.xaxis.set_major_locator(ticker.MultipleLocator(10))ax.yaxis.set_major_locator(ticker.MultipleLocator(1000000))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ax.minorticks_off()&nbsp; &nbsp; &nbsp;&nbsp;ax.set_xlabel("Dates", fontsize = 12, labelpad = 7)ax.set_ylabel("Counter", fontsize = 12)ax.grid(axis='y', color='0.95')fig.autofmt_xdate(rotation = 30)&nbsp;&nbsp;plt.gcf().subplots_adjust(bottom=0.15)&nbsp; &nbsp;plt.show()
随时随地看视频慕课网APP

相关分类

Python
我要回答