我是一名大学生,正在从事一个学期的项目,但我的项目遇到了瓶颈。在我继续之前,请知道我查看了有关堆栈溢出的类似线程,它们似乎都与我的情况不符。
我有一个从 pdf 生成的字符串输入,其中包含来自表格的丰富数据。问题是,由于格式的原因,部门列的某些表条目从 1 行变为 2 行,我无法解决它。例如,
PS 253(由我的算法处理得很好)
嘛
243HON(打破一切)
我需要最终能够将它们放在同一行并删除 MA 之后的“\n”以将其发送到程序的其余部分。我尝试在部门代码 (MA) 之后检查 \n 一两个索引位置,并更改从中获得 243HON 的索引,但这不起作用。
我也试过 String = string.replaceAll("MA \n", "MA ") 如代码所示。删除 MA 和 \n 之间的空格没有任何作用。这是我的代码的相关部分。谢谢!
public static String[] departments = {"\nAS","\nSF","\nAE","\nAF","\nAT","\nLAR","\nAMS","\nBIO","\nBA","\nCHM","\nLCH","\nCIV","\nCSO",
"\nCOM","\nCEC","\nCS","\nCYB","\nEC","\nEE","\nEGR","\nEP","\nES","\nFA","\nGCS","\nHS","\nHON","\nHF","\nHU","\nMA","\nME","\nWX",
"\nMSL","\nNSC","\nPE","\nPS","\nPSY","\nSIM","\nSS","\nSE","\nSP","\nSYS","\nUNIV","\nUA"};
public static String[] departmentsFix = {"\nAS \n","\nSF \n","\nAE \n","\nAF \n","\nAT \n","\nLAR \n","\nAMS \n","\nBIO \n","\nBA \n","\nCHM \n","\nLCH \n","\nCIV \n","\nCSO \n",
"\nCOM \n","\nCEC \n","\nCS \n","\nCYB \n","\nEC \n","\nEE \n","\nEGR \n","\nEP \n","\nES \n","\nFA \n","\nGCS \n","\nHS \n","\nHON \n","\nHF \n","\nHU \n","\nMA \n","\nME \n","\nWX \n",
"\nMSL \n","\nNSC \n","\nPE \n","\nPS \n","\nPSY \n","\nSIM \n","\nSS \n","\nSE \n","\nSP \n","\nSYS \n","\nUNIV \n","\nUA \n"};
public static void main(String[] args) {
// TODO Auto-generated method stub
Loader loader = new Loader();
try {
File file = new File("C:\\Users\\User\\Desktop\\EclipseWorkspace\\SE 300\\ER_SCHED_PRT.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripper s = new PDFTextStripper();
loader.content = s.getText(document);
String[] splitString = loader.content.split("Instructor", 2);
loader.content = splitString[1];
int index = 0;
for (String y : departmentsFix) {
//find any departments with a \n after them and replace it with a space
loader.content = loader.content.replaceAll(y, departments[index] + " ");
index++;
}
白衣染霜花
相关分类