猿问

如何根据通过熊猫从LAMMPS输出文件导入的键合数据将原子细分为三组?

我是编程和分子动力学模拟的新手。我正在使用LAMMPS来模拟物理气相沉积(PVD)过程,并确定不同时间步长中原子之间的相互作用。


在我执行分子动力学模拟后,LAMMPS为我提供了一个输出键文件,其中包含每个单个原子的记录(作为原子ID),它们的类型(向特定元素的编号),以及与这些特定原子键合的其他原子的信息。 典型的债券文件如下所示。


我的目标是根据原子的类型(如组1:氧-氢-氢)对原子进行三组分类,方法是考虑它们从键输出文件中的键合信息,并计算每个时间步长的组数。我使用熊猫,并为每个时间步长创建了一个数据帧。


df = pd.read_table(directory, comment="#", delim_whitespace= True, header=None, usecols=[0,1,2,3,4,5,6] )

headers= ["ID","Type","NofB","bondID_1","bondID_2","bondID_3","bondID_4"]

df.columns = headers

df.fillna(0,inplace=True)

df = df.astype(int)


timestep = int(input("Number of Timesteps: ")) #To display desired number of timesteps.

total_atom_number = 53924 #Total number of atoms in the simulation.


t= 0 #code starts from 0th timestep.

firstTime = []

while(t <= timestep):

    open('file.txt', 'w').close() #In while loop = displays every timestep individually, Out of the while loop = displays results cumulatively.

    i = 0

    df_tablo =(df[total_atom_number*t:total_atom_number*(t+1)]) #Creates a new dataframe that inlucdes only t'th timestep.

    df_tablo.reset_index(inplace=True)



    print(df_tablo)

请参阅此示例,该示例说明了我对 3 个原子进行分组的算法。键合列显示与其行中的原子键合在一起的不同原子(按原子 ID)。例如,通过使用该算法,我们可以对[1,2,5]和[1,2,6]进行分组,但不能对[1,2,1]进行分组,因为原子不能与自身建立键。此外,我们可以在分组后将这些原子ID(第一列)转换为它们的原子类型(第二列),例如[1,3,7]到[1,1,3]。


通过遵循上面提到的键,1)我可以成功地将原子相对于它们的ID分组,2)将它们转换为它们的原子类型,3)分别计算每个时间步中存在的组数。第一个 while 循环(上图)计算每个时间步的组,而第二个 while 循环(下图)将每行中的原子(等于存在的每个原子 ID)与数据帧中不同行的相应绑定原子进行分组。


慕田峪9158850
浏览 82回答 1
1回答

繁星coding

我不确定我是否理解了逻辑,看看这是否有帮助。对于100000三重奏,需要41秒。loc,get_loc是非常广泛的操作,所以把你的表放在字典里,而不是验证一切都是唯一的,把它放在一个集合中import pandas as pdimport randomfrom collections import defaultdict as ddfrom collections import Counterimport time# create 100000 unique trios of numbersids = list(range(50000))trios_set = set()while len(trios_set)<100000:&nbsp; &nbsp; trio = random.sample(ids,3)&nbsp; &nbsp; trios_set.add(frozenset(trio))ids_dict = dd(list)&nbsp; # a dictionery where id is the key and value is all the id who are partner with it in a listfor s in trios_set:&nbsp; &nbsp; for id in s:&nbsp; &nbsp; &nbsp; &nbsp; for other_id in s:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if id!= other_id:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ids_dict[id].append(other_id)ids_dict = dict(ids_dict)for_df = []type_list = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]&nbsp;for id in ids_dict:&nbsp; &nbsp; massage = {}&nbsp; &nbsp; massage["id"] = id&nbsp; &nbsp; other_id_index = 1&nbsp; &nbsp; for other_id in ids_dict[id]:&nbsp; &nbsp; &nbsp; &nbsp; massage["id_"+str(other_id_index)] =&nbsp; other_id&nbsp; &nbsp; &nbsp; &nbsp; other_id_index+=1&nbsp; &nbsp; massage["type"] = random.choice(type_list)&nbsp; &nbsp; for_df.append(massage)df = pd.DataFrame(for_df) # a table with id column and all ids who are with it in trios in id_1 id_2.. and type column with a letter#------------------------------------------------------------------#till here we built the input tablestart_time = time.time() #till here we build the input table, now check the time for 100000 atomstype_dict = {}from_df = dd(set)for i,r in df.iterrows(): #move the dataframe to a dict of id as key and value as list of ids who connected to it&nbsp; &nbsp; for col in df:&nbsp; &nbsp; &nbsp; &nbsp; if "id_"in col and str(r[col])!="nan":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; from_df[r["id"]].add(r[col])&nbsp; &nbsp; type_dict[r["id"]] = r["type"] #save the type of id in a dictioneryfrom_df = dict(from_df)out_trio_set = set()&nbsp;for id in from_df:&nbsp; &nbsp; for other_id in from_df[id]:&nbsp; &nbsp; &nbsp; &nbsp; if other_id!= id and str(other_id)!="nan":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for third_id in from_df[other_id]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; current_trio = frozenset([id, other_id,third_id])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if len(current_trio)==3:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; out_trio_set.add(current_trio)type_conter = Counter()for trio in out_trio_set:&nbsp; &nbsp; type_list = []&nbsp; &nbsp; for id in trio:&nbsp; &nbsp; &nbsp; &nbsp; type_list.append(type_dict[id])&nbsp; &nbsp; type_list = sorted(type_list)&nbsp; &nbsp; atom_type = "".join(type_list)&nbsp; &nbsp; type_conter[atom_type] +=1out_df = pd.DataFrame(type_conter, index = [1]) # in here put index as timestampout_df.to_excel(r"D:\atom.xlsx")print("--- %s seconds ---" % (time.time() - start_time))
随时随地看视频慕课网APP

相关分类

Python
我要回答