12133 lines
447 KiB
Plaintext
12133 lines
447 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# **Analyse 2-2**\n",
|
||
"\n",
|
||
"## Strategie & Fokus\n",
|
||
"\n",
|
||
"- Versuche Clustering bzw. Zusammenfassung von Begriffen (z.B. Prüfung, Prüfen, Überprüfung)\n",
|
||
"- Orientierung an Häufigkeitsverteilung: häufigere Begriffe zuerst analysieren"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"---\n",
|
||
"\n",
|
||
"# Merkmal 1: Clustering von Vorgangsbeschreibungen\n",
|
||
"\n",
|
||
"## Recherche\n",
|
||
"[Textmining HS Hannover](https://textmining.wp.hs-hannover.de/Preprocessing.html)\n",
|
||
"\n",
|
||
"### Allgemeine Zergliederung der Einzelbeschreibungen\n",
|
||
"\n",
|
||
"- Text in Sätze\n",
|
||
"- Sätze in Wörter\n",
|
||
"- Wörter in Grundform:\n",
|
||
" - Lemma: Die Form des Wortes, wie sie in einem Wörterbuch steht. Z.B.: Haus, laufen, begründen\n",
|
||
" - Stamm: Das Wort ohne Flexionsendungen (Prefixe und Suffixe). Z.B.: Haus, lauf, begründ\n",
|
||
" - Wurzel: Kern des Wortes, von dem das Wort ggf. durch Derivation abgeleitet wurde. Z.B.: Haus, lauf, Grund\n",
|
||
"- Wortartbestimmung\n",
|
||
" - klassische Part-of-Speech-Erkennung (herkömmliche Wortart)\n",
|
||
" - Named Entity Recognition (NER) (Eigennamen)\n",
|
||
" - Bsp. spaCy: Person, Ort, Organisation, Verschiedenes\n",
|
||
"\n",
|
||
"#### Semantik\n",
|
||
"\n",
|
||
"- Wörter innerhalb eines Satzes größere Zusammenhänge als außerhalb\n",
|
||
"\n",
|
||
"### Pakete\n",
|
||
"\n",
|
||
"- Englisch: \n",
|
||
" - [NLTK](https://www.nltk.org/)\n",
|
||
"- Deutsch:\n",
|
||
" - [HanTa - The Hanover Tagger](https://github.com/wartaal/HanTa/tree/master)\n",
|
||
" - [TreeTagger](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)\n",
|
||
" - [Python Wrapper](https://treetaggerwrapper.readthedocs.io/en/latest/)\n",
|
||
" - [spaCy](https://spacy.io/)\n",
|
||
" - [Beispiel 1](https://www.trinnovative.de/blog/2020-09-08-natural-language-processing-mit-spacy.html)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"21.02.:\n",
|
||
"- Überarbeitung RegEx-Filterung\n",
|
||
"- Verbesserung Duplikatefindung über Ähnlichkeit"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Analyse"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 339,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"import pandas as pd\n",
|
||
"from pandas import DataFrame, Series\n",
|
||
"import spacy\n",
|
||
"from spacy.lang.de import German as GermanSpacyModel\n",
|
||
"import sentence_transformers\n",
|
||
"from sentence_transformers import SentenceTransformer\n",
|
||
"from collections import Counter\n",
|
||
"from itertools import combinations\n",
|
||
"from dateutil.parser import parse\n",
|
||
"import re\n",
|
||
"from spellchecker import SpellChecker\n",
|
||
"\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import seaborn as sns\n",
|
||
"\n",
|
||
"import logging\n",
|
||
"import sys\n",
|
||
"import pickle\n",
|
||
"\n",
|
||
"LOGGING_LEVEL = 'INFO'\n",
|
||
"logging.basicConfig(level=LOGGING_LEVEL, stream=sys.stdout)\n",
|
||
"logger = logging.getLogger('base')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 340,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def save_pickle(obj, path):\n",
|
||
" with open(path, 'wb') as file:\n",
|
||
" pickle.dump(obj, file, protocol=pickle.HIGHEST_PROTOCOL)\n",
|
||
" \n",
|
||
"def load_pickle(path):\n",
|
||
" with open(path, 'rb') as file:\n",
|
||
" obj = pickle.load(file)\n",
|
||
" return obj"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 341,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.set()\n",
|
||
"LOAD_CALC_FILES = False\n",
|
||
"\n",
|
||
"DESC_BLACKLIST = set(['-'])\n",
|
||
"\"\"\"\n",
|
||
"GENERAL_BLACKLIST = set([\n",
|
||
" 'herr', 'hr.', 'förster', 'graf', 'stöppel', \n",
|
||
" 'stab', 'kw', 'h.', 'koch', 'heininger', '.',\n",
|
||
" 'schwab', 'm.', 'wenninger', '-', '--',\n",
|
||
"])\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"GENERAL_BLACKLIST = set([\n",
|
||
" 'herr', 'hr.' 'kw', 'h.', '.',\n",
|
||
" 'm.', '-', '--', 'dr.', 'dr',\n",
|
||
"])\n",
|
||
"\n",
|
||
"#GENERAL_BLACKLIST = set()\n",
|
||
"#POS_of_interest = set(['NOUN', 'PROPN', 'ADJ', 'VERB', 'AUX'])\n",
|
||
"POS_of_interest = set(['NOUN', 'ADJ', 'VERB', 'AUX'])\n",
|
||
"TAG_of_interest = set(['ADJD'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 388,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "RuntimeError",
|
||
"evalue": "Error(s) in loading state_dict for BertModel:\n\tUnexpected key(s) in state_dict: \"embeddings.position_ids\". ",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)",
|
||
"Cell \u001b[1;32mIn[388], line 5\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;66;03m# load language model\u001b[39;00m\n\u001b[0;32m 2\u001b[0m \u001b[38;5;66;03m# transformer model without vector embeddings\u001b[39;00m\n\u001b[0;32m 3\u001b[0m \u001b[38;5;66;03m# can not be used to calculate similarities\u001b[39;00m\n\u001b[0;32m 4\u001b[0m \u001b[38;5;66;03m# using sentence transformers instead\u001b[39;00m\n\u001b[1;32m----> 5\u001b[0m nlp \u001b[38;5;241m=\u001b[39m \u001b[43mspacy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mde_dep_news_trf\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m 6\u001b[0m \u001b[38;5;66;03m#nlp = spacy.load('de_core_news_lg')\u001b[39;00m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\__init__.py:51\u001b[0m, in \u001b[0;36mload\u001b[1;34m(name, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[0;32m 27\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mload\u001b[39m(\n\u001b[0;32m 28\u001b[0m name: Union[\u001b[38;5;28mstr\u001b[39m, Path],\n\u001b[0;32m 29\u001b[0m \u001b[38;5;241m*\u001b[39m,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 34\u001b[0m config: Union[Dict[\u001b[38;5;28mstr\u001b[39m, Any], Config] \u001b[38;5;241m=\u001b[39m util\u001b[38;5;241m.\u001b[39mSimpleFrozenDict(),\n\u001b[0;32m 35\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Language:\n\u001b[0;32m 36\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Load a spaCy model from an installed package or a local path.\u001b[39;00m\n\u001b[0;32m 37\u001b[0m \n\u001b[0;32m 38\u001b[0m \u001b[38;5;124;03m name (str): Package name or model path.\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 49\u001b[0m \u001b[38;5;124;03m RETURNS (Language): The loaded nlp object.\u001b[39;00m\n\u001b[0;32m 50\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m---> 51\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mutil\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_model\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 52\u001b[0m \u001b[43m \u001b[49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 53\u001b[0m \u001b[43m \u001b[49m\u001b[43mvocab\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvocab\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 54\u001b[0m \u001b[43m \u001b[49m\u001b[43mdisable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdisable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 55\u001b[0m \u001b[43m \u001b[49m\u001b[43menable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 56\u001b[0m \u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexclude\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 57\u001b[0m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 58\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:465\u001b[0m, in \u001b[0;36mload_model\u001b[1;34m(name, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[0;32m 463\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m get_lang_class(name\u001b[38;5;241m.\u001b[39mreplace(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mblank:\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m))()\n\u001b[0;32m 464\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m is_package(name): \u001b[38;5;66;03m# installed as package\u001b[39;00m\n\u001b[1;32m--> 465\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mload_model_from_package\u001b[49m\u001b[43m(\u001b[49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# type: ignore[arg-type]\u001b[39;00m\n\u001b[0;32m 466\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m Path(name)\u001b[38;5;241m.\u001b[39mexists(): \u001b[38;5;66;03m# path to model data directory\u001b[39;00m\n\u001b[0;32m 467\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m load_model_from_path(Path(name), \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs) \u001b[38;5;66;03m# type: ignore[arg-type]\u001b[39;00m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:501\u001b[0m, in \u001b[0;36mload_model_from_package\u001b[1;34m(name, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[0;32m 484\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Load a model from an installed package.\u001b[39;00m\n\u001b[0;32m 485\u001b[0m \n\u001b[0;32m 486\u001b[0m \u001b[38;5;124;03mname (str): The package name.\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 498\u001b[0m \u001b[38;5;124;03mRETURNS (Language): The loaded nlp object.\u001b[39;00m\n\u001b[0;32m 499\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 500\u001b[0m \u001b[38;5;28mcls\u001b[39m \u001b[38;5;241m=\u001b[39m importlib\u001b[38;5;241m.\u001b[39mimport_module(name)\n\u001b[1;32m--> 501\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mcls\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvocab\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvocab\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdisable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdisable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43menable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menable\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexclude\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\de_dep_news_trf\\__init__.py:10\u001b[0m, in \u001b[0;36mload\u001b[1;34m(**overrides)\u001b[0m\n\u001b[0;32m 9\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mload\u001b[39m(\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39moverrides):\n\u001b[1;32m---> 10\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mload_model_from_init_py\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;18;43m__file__\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43moverrides\u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:682\u001b[0m, in \u001b[0;36mload_model_from_init_py\u001b[1;34m(init_file, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[0;32m 680\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m model_path\u001b[38;5;241m.\u001b[39mexists():\n\u001b[0;32m 681\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIOError\u001b[39;00m(Errors\u001b[38;5;241m.\u001b[39mE052\u001b[38;5;241m.\u001b[39mformat(path\u001b[38;5;241m=\u001b[39mdata_path))\n\u001b[1;32m--> 682\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mload_model_from_path\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 683\u001b[0m \u001b[43m \u001b[49m\u001b[43mdata_path\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 684\u001b[0m \u001b[43m \u001b[49m\u001b[43mvocab\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvocab\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 685\u001b[0m \u001b[43m \u001b[49m\u001b[43mmeta\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmeta\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 686\u001b[0m \u001b[43m \u001b[49m\u001b[43mdisable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdisable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 687\u001b[0m \u001b[43m \u001b[49m\u001b[43menable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 688\u001b[0m \u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexclude\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 689\u001b[0m \u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 690\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:547\u001b[0m, in \u001b[0;36mload_model_from_path\u001b[1;34m(model_path, meta, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[0;32m 538\u001b[0m config \u001b[38;5;241m=\u001b[39m load_config(config_path, overrides\u001b[38;5;241m=\u001b[39moverrides)\n\u001b[0;32m 539\u001b[0m nlp \u001b[38;5;241m=\u001b[39m load_model_from_config(\n\u001b[0;32m 540\u001b[0m config,\n\u001b[0;32m 541\u001b[0m vocab\u001b[38;5;241m=\u001b[39mvocab,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 545\u001b[0m meta\u001b[38;5;241m=\u001b[39mmeta,\n\u001b[0;32m 546\u001b[0m )\n\u001b[1;32m--> 547\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mnlp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_disk\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel_path\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexclude\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moverrides\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43moverrides\u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\language.py:2156\u001b[0m, in \u001b[0;36mLanguage.from_disk\u001b[1;34m(self, path, exclude, overrides)\u001b[0m\n\u001b[0;32m 2153\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (path \u001b[38;5;241m/\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mexists() \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m exclude: \u001b[38;5;66;03m# type: ignore[operator]\u001b[39;00m\n\u001b[0;32m 2154\u001b[0m \u001b[38;5;66;03m# Convert to list here in case exclude is (default) tuple\u001b[39;00m\n\u001b[0;32m 2155\u001b[0m exclude \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(exclude) \u001b[38;5;241m+\u001b[39m [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m-> 2156\u001b[0m \u001b[43mutil\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_disk\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdeserializers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# type: ignore[arg-type]\u001b[39;00m\n\u001b[0;32m 2157\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_path \u001b[38;5;241m=\u001b[39m path \u001b[38;5;66;03m# type: ignore[assignment]\u001b[39;00m\n\u001b[0;32m 2158\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_link_components()\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:1392\u001b[0m, in \u001b[0;36mfrom_disk\u001b[1;34m(path, readers, exclude)\u001b[0m\n\u001b[0;32m 1389\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m key, reader \u001b[38;5;129;01min\u001b[39;00m readers\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m 1390\u001b[0m \u001b[38;5;66;03m# Split to support file names like meta.json\u001b[39;00m\n\u001b[0;32m 1391\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m key\u001b[38;5;241m.\u001b[39msplit(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m)[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m exclude:\n\u001b[1;32m-> 1392\u001b[0m \u001b[43mreader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m/\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1393\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m path\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\language.py:2150\u001b[0m, in \u001b[0;36mLanguage.from_disk.<locals>.<lambda>\u001b[1;34m(p, proc)\u001b[0m\n\u001b[0;32m 2148\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mhasattr\u001b[39m(proc, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mfrom_disk\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[0;32m 2149\u001b[0m \u001b[38;5;28;01mcontinue\u001b[39;00m\n\u001b[1;32m-> 2150\u001b[0m deserializers[name] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mlambda\u001b[39;00m p, proc\u001b[38;5;241m=\u001b[39mproc: \u001b[43mproc\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_disk\u001b[49m\u001b[43m(\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# type: ignore[misc]\u001b[39;49;00m\n\u001b[0;32m 2151\u001b[0m \u001b[43m \u001b[49m\u001b[43mp\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mvocab\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\n\u001b[0;32m 2152\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 2153\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (path \u001b[38;5;241m/\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m)\u001b[38;5;241m.\u001b[39mexists() \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m exclude: \u001b[38;5;66;03m# type: ignore[operator]\u001b[39;00m\n\u001b[0;32m 2154\u001b[0m \u001b[38;5;66;03m# Convert to list here in case exclude is (default) tuple\u001b[39;00m\n\u001b[0;32m 2155\u001b[0m exclude \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlist\u001b[39m(exclude) \u001b[38;5;241m+\u001b[39m [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy_transformers\\pipeline_component.py:416\u001b[0m, in \u001b[0;36mTransformer.from_disk\u001b[1;34m(self, path, exclude)\u001b[0m\n\u001b[0;32m 409\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel\u001b[38;5;241m.\u001b[39mattrs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mset_transformer\u001b[39m\u001b[38;5;124m\"\u001b[39m](\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel, hf_model)\n\u001b[0;32m 411\u001b[0m deserialize \u001b[38;5;241m=\u001b[39m {\n\u001b[0;32m 412\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mvocab\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mvocab\u001b[38;5;241m.\u001b[39mfrom_disk,\n\u001b[0;32m 413\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcfg\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28;01mlambda\u001b[39;00m p: \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mcfg\u001b[38;5;241m.\u001b[39mupdate(deserialize_config(p)),\n\u001b[0;32m 414\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmodel\u001b[39m\u001b[38;5;124m\"\u001b[39m: load_model,\n\u001b[0;32m 415\u001b[0m }\n\u001b[1;32m--> 416\u001b[0m \u001b[43mutil\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_disk\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdeserialize\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mexclude\u001b[49m\u001b[43m)\u001b[49m \u001b[38;5;66;03m# type: ignore\u001b[39;00m\n\u001b[0;32m 417\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy\\util.py:1392\u001b[0m, in \u001b[0;36mfrom_disk\u001b[1;34m(path, readers, exclude)\u001b[0m\n\u001b[0;32m 1389\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m key, reader \u001b[38;5;129;01min\u001b[39;00m readers\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m 1390\u001b[0m \u001b[38;5;66;03m# Split to support file names like meta.json\u001b[39;00m\n\u001b[0;32m 1391\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m key\u001b[38;5;241m.\u001b[39msplit(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m)[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m exclude:\n\u001b[1;32m-> 1392\u001b[0m \u001b[43mreader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpath\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m/\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1393\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m path\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy_transformers\\pipeline_component.py:390\u001b[0m, in \u001b[0;36mTransformer.from_disk.<locals>.load_model\u001b[1;34m(p)\u001b[0m\n\u001b[0;32m 388\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m 389\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(p, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrb\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m mfile:\n\u001b[1;32m--> 390\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmodel\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_bytes\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmfile\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 391\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mAttributeError\u001b[39;00m:\n\u001b[0;32m 392\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(Errors\u001b[38;5;241m.\u001b[39mE149) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\thinc\\model.py:619\u001b[0m, in \u001b[0;36mModel.from_bytes\u001b[1;34m(self, bytes_data)\u001b[0m\n\u001b[0;32m 617\u001b[0m msg \u001b[38;5;241m=\u001b[39m srsly\u001b[38;5;241m.\u001b[39mmsgpack_loads(bytes_data)\n\u001b[0;32m 618\u001b[0m msg \u001b[38;5;241m=\u001b[39m convert_recursive(is_xp_array, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mops\u001b[38;5;241m.\u001b[39masarray, msg)\n\u001b[1;32m--> 619\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_dict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmsg\u001b[49m\u001b[43m)\u001b[49m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\thinc\\model.py:657\u001b[0m, in \u001b[0;36mModel.from_dict\u001b[1;34m(self, msg)\u001b[0m\n\u001b[0;32m 655\u001b[0m node\u001b[38;5;241m.\u001b[39mset_param(param_name, value)\n\u001b[0;32m 656\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m i, shim_bytes \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(msg[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mshims\u001b[39m\u001b[38;5;124m\"\u001b[39m][i]):\n\u001b[1;32m--> 657\u001b[0m \u001b[43mnode\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mshims\u001b[49m\u001b[43m[\u001b[49m\u001b[43mi\u001b[49m\u001b[43m]\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfrom_bytes\u001b[49m\u001b[43m(\u001b[49m\u001b[43mshim_bytes\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 658\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\spacy_transformers\\layers\\hf_shim.py:120\u001b[0m, in \u001b[0;36mHFShim.from_bytes\u001b[1;34m(self, bytes_data)\u001b[0m\n\u001b[0;32m 118\u001b[0m filelike\u001b[38;5;241m.\u001b[39mseek(\u001b[38;5;241m0\u001b[39m)\n\u001b[0;32m 119\u001b[0m device \u001b[38;5;241m=\u001b[39m get_torch_default_device()\n\u001b[1;32m--> 120\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_model\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_state_dict\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtorch\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilelike\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmap_location\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdevice\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 121\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_model\u001b[38;5;241m.\u001b[39mto(device)\n\u001b[0;32m 122\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n",
|
||
"File \u001b[1;32mc:\\Users\\foersterflorian\\mambaforge\\envs\\test\\Lib\\site-packages\\torch\\nn\\modules\\module.py:2041\u001b[0m, in \u001b[0;36mModule.load_state_dict\u001b[1;34m(self, state_dict, strict)\u001b[0m\n\u001b[0;32m 2036\u001b[0m error_msgs\u001b[38;5;241m.\u001b[39minsert(\n\u001b[0;32m 2037\u001b[0m \u001b[38;5;241m0\u001b[39m, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mMissing key(s) in state_dict: \u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m. \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;241m.\u001b[39mformat(\n\u001b[0;32m 2038\u001b[0m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;241m.\u001b[39mformat(k) \u001b[38;5;28;01mfor\u001b[39;00m k \u001b[38;5;129;01min\u001b[39;00m missing_keys)))\n\u001b[0;32m 2040\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(error_msgs) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n\u001b[1;32m-> 2041\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mError(s) in loading state_dict for \u001b[39m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m:\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\t\u001b[39;00m\u001b[38;5;132;01m{}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;241m.\u001b[39mformat(\n\u001b[0;32m 2042\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\t\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;241m.\u001b[39mjoin(error_msgs)))\n\u001b[0;32m 2043\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m _IncompatibleKeys(missing_keys, unexpected_keys)\n",
|
||
"\u001b[1;31mRuntimeError\u001b[0m: Error(s) in loading state_dict for BertModel:\n\tUnexpected key(s) in state_dict: \"embeddings.position_ids\". "
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# load language model\n",
|
||
"# transformer model without vector embeddings\n",
|
||
"# can not be used to calculate similarities\n",
|
||
"# using sentence transformers instead\n",
|
||
"nlp = spacy.load('de_dep_news_trf')\n",
|
||
"#nlp = spacy.load('de_core_news_lg')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 343,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2\n",
|
||
"INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: cpu\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"model_stfr = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 344,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"RangeIndex: 129020 entries, 0 to 129019\n",
|
||
"Data columns (total 20 columns):\n",
|
||
" # Column Non-Null Count Dtype \n",
|
||
"--- ------ -------------- ----- \n",
|
||
" 0 VorgangsID 129020 non-null int64 \n",
|
||
" 1 ObjektID 129020 non-null int64 \n",
|
||
" 2 HObjektText 129003 non-null object \n",
|
||
" 3 ObjektArtID 129020 non-null int64 \n",
|
||
" 4 ObjektArtText 128372 non-null object \n",
|
||
" 5 VorgangsTypID 129020 non-null int64 \n",
|
||
" 6 VorgangsTypName 129020 non-null object \n",
|
||
" 7 VorgangsDatum 129020 non-null datetime64[ns]\n",
|
||
" 8 VorgangsStatusId 129020 non-null int64 \n",
|
||
" 9 VorgangsPrioritaet 129020 non-null int64 \n",
|
||
" 10 VorgangsBeschreibung 124087 non-null object \n",
|
||
" 11 VorgangsOrt 507 non-null object \n",
|
||
" 12 VorgangsArtText 129020 non-null object \n",
|
||
" 13 ErledigungsDatum 129020 non-null datetime64[ns]\n",
|
||
" 14 ErledigungsArtText 128474 non-null object \n",
|
||
" 15 ErledigungsBeschreibung 118135 non-null object \n",
|
||
" 16 MPMelderArbeitsplatz 6359 non-null object \n",
|
||
" 17 MPAbteilungBezeichnung 6359 non-null object \n",
|
||
" 18 Arbeitsbeginn 123538 non-null datetime64[ns]\n",
|
||
" 19 ErstellungsDatum 129020 non-null datetime64[ns]\n",
|
||
"dtypes: datetime64[ns](4), int64(6), object(10)\n",
|
||
"memory usage: 19.7+ MB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# load dataset\n",
|
||
"FILE_PATH = '01_2_Rohdaten_neu/Export4.csv'\n",
|
||
"date_cols = ['VorgangsDatum', 'ErledigungsDatum', 'Arbeitsbeginn', 'ErstellungsDatum']\n",
|
||
"raw = pd.read_csv(filepath_or_buffer=FILE_PATH, sep=';', encoding='cp1252', parse_dates=date_cols, dayfirst=True)\n",
|
||
"raw.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 345,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>11</td>\n",
|
||
" <td>114</td>\n",
|
||
" <td>427 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kettbaum kaputt</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>17</td>\n",
|
||
" <td>124</td>\n",
|
||
" <td>621 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>asgasdg</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Elektrowerkstatt</td>\n",
|
||
" <td>Elektrowerkstatt</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>53</td>\n",
|
||
" <td>244</td>\n",
|
||
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Greifer-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Kupplung schleift</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kupplung defekt</td>\n",
|
||
" <td>2019-03-20</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>58</td>\n",
|
||
" <td>257</td>\n",
|
||
" <td>107, Webmaschine, OM 220 EOS</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Gegengewicht wieder anbringen</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>81</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>00138, Schärmaschine 9,</td>\n",
|
||
" <td>16</td>\n",
|
||
" <td>Schärmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText \\\n",
|
||
"0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
|
||
"3 58 257 107, Webmaschine, OM 220 EOS \n",
|
||
"4 81 138 00138, Schärmaschine 9, \n",
|
||
"\n",
|
||
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
|
||
"0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"\n",
|
||
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"0 2019-03-06 4 0 \n",
|
||
"1 2019-03-11 5 0 \n",
|
||
"2 2019-03-19 5 0 \n",
|
||
"3 2019-03-21 5 0 \n",
|
||
"4 2019-03-25 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"0 NaN NaN \n",
|
||
"1 NaN NaN \n",
|
||
"2 Kupplung schleift NaN \n",
|
||
"3 Gegengewicht wieder anbringen NaN \n",
|
||
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"0 Kettbaum kaputt 2019-03-06 \n",
|
||
"1 asgasdg 2019-03-11 \n",
|
||
"2 Kupplung defekt 2019-03-20 \n",
|
||
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
|
||
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
|
||
"\n",
|
||
" ErledigungsArtText ErledigungsBeschreibung \\\n",
|
||
"0 NaN NaN \n",
|
||
"1 NaN NaN \n",
|
||
"2 Reparatur UTT NaN \n",
|
||
"3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
|
||
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
|
||
"\n",
|
||
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"0 Weberei Weberei NaT 2019-03-06 \n",
|
||
"1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n",
|
||
"2 Weberei Weberei NaT 2019-03-19 \n",
|
||
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
|
||
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 "
|
||
]
|
||
},
|
||
"execution_count": 345,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"raw.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 346,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Features: 20\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(f\"Anzahl Features: {len(raw.columns)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Neue Features gegenüber letzter Analyse:**\n",
|
||
"- ``ObjektArtID``\n",
|
||
"- ``ObjektArtText``\n",
|
||
"- ``VorgangsTypName``"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Duplikate"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 347,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"duplicates_filt = raw.duplicated()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 348,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Duplikate: 84\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(f\"Anzahl Duplikate: {duplicates_filt.sum()}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 349,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"filt_data = raw[duplicates_filt]\n",
|
||
"uni_obj_id_dupl = filt_data['ObjektID'].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 350,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl einzigartiger Objekt-IDs unter Duplikaten: 47\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(f\"Anzahl einzigartiger Objekt-IDs unter Duplikaten: {len(uni_obj_id_dupl)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 351,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"RangeIndex: 128936 entries, 0 to 128935\n",
|
||
"Data columns (total 20 columns):\n",
|
||
" # Column Non-Null Count Dtype \n",
|
||
"--- ------ -------------- ----- \n",
|
||
" 0 VorgangsID 128936 non-null int64 \n",
|
||
" 1 ObjektID 128936 non-null int64 \n",
|
||
" 2 HObjektText 128920 non-null object \n",
|
||
" 3 ObjektArtID 128936 non-null int64 \n",
|
||
" 4 ObjektArtText 128289 non-null object \n",
|
||
" 5 VorgangsTypID 128936 non-null int64 \n",
|
||
" 6 VorgangsTypName 128936 non-null object \n",
|
||
" 7 VorgangsDatum 128936 non-null datetime64[ns]\n",
|
||
" 8 VorgangsStatusId 128936 non-null int64 \n",
|
||
" 9 VorgangsPrioritaet 128936 non-null int64 \n",
|
||
" 10 VorgangsBeschreibung 124008 non-null object \n",
|
||
" 11 VorgangsOrt 507 non-null object \n",
|
||
" 12 VorgangsArtText 128936 non-null object \n",
|
||
" 13 ErledigungsDatum 128936 non-null datetime64[ns]\n",
|
||
" 14 ErledigungsArtText 128402 non-null object \n",
|
||
" 15 ErledigungsBeschreibung 118086 non-null object \n",
|
||
" 16 MPMelderArbeitsplatz 6337 non-null object \n",
|
||
" 17 MPAbteilungBezeichnung 6337 non-null object \n",
|
||
" 18 Arbeitsbeginn 123480 non-null datetime64[ns]\n",
|
||
" 19 ErstellungsDatum 128936 non-null datetime64[ns]\n",
|
||
"dtypes: datetime64[ns](4), int64(6), object(10)\n",
|
||
"memory usage: 19.7+ MB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"wo_duplicates = raw.drop_duplicates(ignore_index=True)\n",
|
||
"wo_duplicates.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### ``VorgangsBeschreibung``"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### **NA vals und Duplikate**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"String-Bereinigung"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 352,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"SPECIAL_CHARS = set(['&', '$', '%', '§', '/', '(', ')', '_', \n",
|
||
" '+', '–', '--', '<', '>', '´',\n",
|
||
"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 353,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def clean_string_slim(string: str) -> str:\n",
|
||
" # remove special chars\n",
|
||
" pattern = r'[\\t\\n\\r\\f\\v]'\n",
|
||
" string = re.sub(pattern, ' ', string)\n",
|
||
" # remove whitespaces at the beginning and the end\n",
|
||
" string = string.strip()\n",
|
||
" \n",
|
||
" return string\n",
|
||
"\n",
|
||
"def clean_string(string: str) -> str:\n",
|
||
" #num_reps = 5\n",
|
||
" \n",
|
||
" # remove special chars\n",
|
||
" pattern = r'[\\t\\n\\r\\f\\v]'\n",
|
||
" string = re.sub(pattern, ' ', string)\n",
|
||
" # remove dates\n",
|
||
" pattern = r'[\\d]{1,4}[.:][\\d]{1,4}[.:][\\d]{1,4}'\n",
|
||
" string = re.sub(pattern, '', string)\n",
|
||
" # remove times\n",
|
||
" pattern = r'[\\d]{1,2}[:][\\d]{1,2}[:][\\d]{0,2}'\n",
|
||
" string = re.sub(pattern, '', string)\n",
|
||
" # remove all chars despite punctuation and alphanumeric ones\n",
|
||
" pattern = r'[^ \\w.,;:\\-äöüÄÖÜ]+'\n",
|
||
" string = re.sub(pattern, '', string)\n",
|
||
" # remove - where it is used as em dash\n",
|
||
" pattern = r'[\\W]+-[\\W]+'\n",
|
||
" string = re.sub(pattern, ' ', string)\n",
|
||
" # remove whitespaces in front of punctuation\n",
|
||
" pattern = r'[ ]+([;,.:])'\n",
|
||
" string = re.sub(pattern, r'\\1', string)\n",
|
||
" # remove multiple whitespaces\n",
|
||
" pattern = r'[ ]+'\n",
|
||
" string = re.sub(pattern, ' ', string)\n",
|
||
" # remove whitespaces at the beginning and the end\n",
|
||
" string = string.strip()\n",
|
||
" \n",
|
||
" #while num_reps != 0:\n",
|
||
" #string = string.replace('\\n', ' ')\n",
|
||
" #string = string.replace('\\t', ' ')\n",
|
||
" #string = string.replace(' ', ' ')\n",
|
||
" #string = string.replace(' ', ' ')\n",
|
||
" #string = string.replace(' - ', ' ')\n",
|
||
" \"\"\"\n",
|
||
" for char in SPECIAL_CHARS:\n",
|
||
" string = string.replace(char, '')\n",
|
||
" \n",
|
||
" #num_reps -= 1\n",
|
||
" \n",
|
||
" # remove spaces at the beginning and the end\n",
|
||
" string = string.strip()\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" return string"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 354,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"base = wo_duplicates.copy()\n",
|
||
"base = base.dropna(axis=0, subset='VorgangsBeschreibung')\n",
|
||
"# preprocessing\n",
|
||
"#base['VorgangsBeschreibung'] = base['VorgangsBeschreibung'].map(clean_string)\n",
|
||
"base['VorgangsBeschreibung'] = base['VorgangsBeschreibung'].map(clean_string_slim)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 355,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>53</td>\n",
|
||
" <td>244</td>\n",
|
||
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Greifer-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Kupplung schleift</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kupplung defekt</td>\n",
|
||
" <td>2019-03-20</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>58</td>\n",
|
||
" <td>257</td>\n",
|
||
" <td>107, Webmaschine, OM 220 EOS</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Gegengewicht wieder anbringen</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>81</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>00138, Schärmaschine 9,</td>\n",
|
||
" <td>16</td>\n",
|
||
" <td>Schärmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>82</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Warenschau allgemein</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Klappbügel Portalkran H31 defekt</td>\n",
|
||
" <td>Warenschau allgemein</td>\n",
|
||
" <td>Allgemeine Reparaturarbeiten</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Feder ausgetauscht</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>76</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Neben der Türe</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-22</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Schraube nix mer gut</td>\n",
|
||
" <td>Neben der Türe</td>\n",
|
||
" <td>Kettbaum</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\t</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-22</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128931</th>\n",
|
||
" <td>518956</td>\n",
|
||
" <td>1708</td>\n",
|
||
" <td>01708, Betriebsfahrräder Schlosserei,</td>\n",
|
||
" <td>57</td>\n",
|
||
" <td>Interne Wartungsobjekte</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2023-06-19</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2-wöchige Reinigung & Sichtkontrolle (Technisc...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>02 Interne Reinigung / Pflege / Überprüfung</td>\n",
|
||
" <td>2023-06-19</td>\n",
|
||
" <td>Intern UTT - Prüfung</td>\n",
|
||
" <td>Reinigung & Sichtkontrolle (Technische Einric...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2023-06-19</td>\n",
|
||
" <td>2023-03-14</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128932</th>\n",
|
||
" <td>275123</td>\n",
|
||
" <td>1654</td>\n",
|
||
" <td>WEBEREI ALLGEMEIN, Weberei allgemein,</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>UTT allgemein</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2022-09-29</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Adapter entfernen und Gewinde nachschneiden.</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kettbaum-Adapter</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>Intern UTT - Reparatur</td>\n",
|
||
" <td>mit schlosserei aufräumen</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>2022-09-29</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128933</th>\n",
|
||
" <td>275125</td>\n",
|
||
" <td>1795</td>\n",
|
||
" <td>A054.S, Jacquardmaschine,</td>\n",
|
||
" <td>24</td>\n",
|
||
" <td>Stäubli-Jacquardmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Alle 4 Schrauben und teile der Kettbaumlagerun...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kettbaum</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>Intern UTT - Reparatur</td>\n",
|
||
" <td>Neues Teil eingebaut und altes repariert</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128934</th>\n",
|
||
" <td>275188</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>00001, Ausrüstungsanlage 1,</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Waschmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Walzenlager WK 6 überprüfen/auswechseln</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Lagereinheit (Wälzlager, Kugellager, etc.)</td>\n",
|
||
" <td>2022-10-04</td>\n",
|
||
" <td>Intern UTT - Reparatur</td>\n",
|
||
" <td>Lager getauscht</td>\n",
|
||
" <td>Ausrüstung</td>\n",
|
||
" <td>Ausrüstung</td>\n",
|
||
" <td>2022-10-04</td>\n",
|
||
" <td>2022-09-30</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128935</th>\n",
|
||
" <td>275219</td>\n",
|
||
" <td>326</td>\n",
|
||
" <td>B38, Niederhubwagen,</td>\n",
|
||
" <td>32</td>\n",
|
||
" <td>Flurförderzeuge / Putzmaschine / Rasenmäher</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2022-10-03</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Flurförderzeug</td>\n",
|
||
" <td>2022-10-05</td>\n",
|
||
" <td>Intern UTT - Reparatur</td>\n",
|
||
" <td>Neue Gasfeder eingebaut</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>2022-10-04</td>\n",
|
||
" <td>2022-10-03</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>124008 rows × 20 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText \\\n",
|
||
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
|
||
"3 58 257 107, Webmaschine, OM 220 EOS \n",
|
||
"4 81 138 00138, Schärmaschine 9, \n",
|
||
"5 82 0 Warenschau allgemein \n",
|
||
"6 76 0 Neben der Türe \n",
|
||
"... ... ... ... \n",
|
||
"128931 518956 1708 01708, Betriebsfahrräder Schlosserei, \n",
|
||
"128932 275123 1654 WEBEREI ALLGEMEIN, Weberei allgemein, \n",
|
||
"128933 275125 1795 A054.S, Jacquardmaschine, \n",
|
||
"128934 275188 1 00001, Ausrüstungsanlage 1, \n",
|
||
"128935 275219 326 B38, Niederhubwagen, \n",
|
||
"\n",
|
||
" ObjektArtID ObjektArtText \\\n",
|
||
"2 5 Greifer-Webmaschine \n",
|
||
"3 3 Luft-Webmaschine \n",
|
||
"4 16 Schärmaschine \n",
|
||
"5 0 NaN \n",
|
||
"6 0 NaN \n",
|
||
"... ... ... \n",
|
||
"128931 57 Interne Wartungsobjekte \n",
|
||
"128932 90 UTT allgemein \n",
|
||
"128933 24 Stäubli-Jacquardmaschine \n",
|
||
"128934 1 Waschmaschine \n",
|
||
"128935 32 Flurförderzeuge / Putzmaschine / Rasenmäher \n",
|
||
"\n",
|
||
" VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
|
||
"2 3 Reparaturauftrag (Portal) 2019-03-19 \n",
|
||
"3 3 Reparaturauftrag (Portal) 2019-03-21 \n",
|
||
"4 3 Reparaturauftrag (Portal) 2019-03-25 \n",
|
||
"5 3 Reparaturauftrag (Portal) 2019-03-25 \n",
|
||
"6 3 Reparaturauftrag (Portal) 2019-03-22 \n",
|
||
"... ... ... ... \n",
|
||
"128931 1 Wartung 2023-06-19 \n",
|
||
"128932 3 Reparaturauftrag (Portal) 2022-09-29 \n",
|
||
"128933 3 Reparaturauftrag (Portal) 2022-09-30 \n",
|
||
"128934 3 Reparaturauftrag (Portal) 2022-09-30 \n",
|
||
"128935 3 Reparaturauftrag (Portal) 2022-10-03 \n",
|
||
"\n",
|
||
" VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"2 5 0 \n",
|
||
"3 5 0 \n",
|
||
"4 5 0 \n",
|
||
"5 5 0 \n",
|
||
"6 5 0 \n",
|
||
"... ... ... \n",
|
||
"128931 5 0 \n",
|
||
"128932 5 0 \n",
|
||
"128933 5 0 \n",
|
||
"128934 5 1 \n",
|
||
"128935 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung \\\n",
|
||
"2 Kupplung schleift \n",
|
||
"3 Gegengewicht wieder anbringen \n",
|
||
"4 da ist etwas gebrochen. (Herr Heininger) \n",
|
||
"5 Klappbügel Portalkran H31 defekt \n",
|
||
"6 Schraube nix mer gut \n",
|
||
"... ... \n",
|
||
"128931 2-wöchige Reinigung & Sichtkontrolle (Technisc... \n",
|
||
"128932 Adapter entfernen und Gewinde nachschneiden. \n",
|
||
"128933 Alle 4 Schrauben und teile der Kettbaumlagerun... \n",
|
||
"128934 Walzenlager WK 6 überprüfen/auswechseln \n",
|
||
"128935 Befestigung Deckel für Batteriefach defekt ... \n",
|
||
"\n",
|
||
" VorgangsOrt \\\n",
|
||
"2 NaN \n",
|
||
"3 NaN \n",
|
||
"4 NaN \n",
|
||
"5 Warenschau allgemein \n",
|
||
"6 Neben der Türe \n",
|
||
"... ... \n",
|
||
"128931 NaN \n",
|
||
"128932 NaN \n",
|
||
"128933 NaN \n",
|
||
"128934 NaN \n",
|
||
"128935 NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"2 Kupplung defekt 2019-03-20 \n",
|
||
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
|
||
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
|
||
"5 Allgemeine Reparaturarbeiten 2019-03-25 \n",
|
||
"6 Kettbaum 2019-03-25 \n",
|
||
"... ... ... \n",
|
||
"128931 02 Interne Reinigung / Pflege / Überprüfung 2023-06-19 \n",
|
||
"128932 Kettbaum-Adapter 2022-09-30 \n",
|
||
"128933 Kettbaum 2022-09-30 \n",
|
||
"128934 Lagereinheit (Wälzlager, Kugellager, etc.) 2022-10-04 \n",
|
||
"128935 Flurförderzeug 2022-10-05 \n",
|
||
"\n",
|
||
" ErledigungsArtText \\\n",
|
||
"2 Reparatur UTT \n",
|
||
"3 Reparatur UTT \n",
|
||
"4 Reparatur UTT \n",
|
||
"5 Reparatur UTT \n",
|
||
"6 Reparatur UTT \n",
|
||
"... ... \n",
|
||
"128931 Intern UTT - Prüfung \n",
|
||
"128932 Intern UTT - Reparatur \n",
|
||
"128933 Intern UTT - Reparatur \n",
|
||
"128934 Intern UTT - Reparatur \n",
|
||
"128935 Intern UTT - Reparatur \n",
|
||
"\n",
|
||
" ErledigungsBeschreibung \\\n",
|
||
"2 NaN \n",
|
||
"3 Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
|
||
"4 Bolzen gebrochen. Bolzen neu angefertig und di... \n",
|
||
"5 Feder ausgetauscht \n",
|
||
"6 Schrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\t \n",
|
||
"... ... \n",
|
||
"128931 Reinigung & Sichtkontrolle (Technische Einric... \n",
|
||
"128932 mit schlosserei aufräumen \n",
|
||
"128933 Neues Teil eingebaut und altes repariert \n",
|
||
"128934 Lager getauscht \n",
|
||
"128935 Neue Gasfeder eingebaut \n",
|
||
"\n",
|
||
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn \\\n",
|
||
"2 Weberei Weberei NaT \n",
|
||
"3 Weberei Weberei 2019-03-21 \n",
|
||
"4 Vorwerk Vorwerk 2019-03-25 \n",
|
||
"5 Warenschau Warenschau 2019-03-25 \n",
|
||
"6 Vorwerk Vorwerk 2019-03-25 \n",
|
||
"... ... ... ... \n",
|
||
"128931 NaN NaN 2023-06-19 \n",
|
||
"128932 Weberei Weberei 2022-09-30 \n",
|
||
"128933 Weberei Weberei 2022-09-30 \n",
|
||
"128934 Ausrüstung Ausrüstung 2022-10-04 \n",
|
||
"128935 Warenschau Warenschau 2022-10-04 \n",
|
||
"\n",
|
||
" ErstellungsDatum \n",
|
||
"2 2019-03-19 \n",
|
||
"3 2019-03-21 \n",
|
||
"4 2019-03-25 \n",
|
||
"5 2019-03-25 \n",
|
||
"6 2019-03-22 \n",
|
||
"... ... \n",
|
||
"128931 2023-03-14 \n",
|
||
"128932 2022-09-29 \n",
|
||
"128933 2022-09-30 \n",
|
||
"128934 2022-09-30 \n",
|
||
"128935 2022-10-03 \n",
|
||
"\n",
|
||
"[124008 rows x 20 columns]"
|
||
]
|
||
},
|
||
"execution_count": 355,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"base"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 356,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Einträge: 124008\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"descriptions = base['VorgangsBeschreibung']\n",
|
||
"print(f\"Einträge: {len(descriptions)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 357,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Duplikate Vorgangsbeschreibungen: 117208\n",
|
||
"Anzahl einzigartiger Vorgangsbeschreibungen: 6800\n",
|
||
"Anteil einzigartiger Vorgangsbeschreibungen: 5.48 %\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"num_dupl_descr = descriptions.duplicated().sum()\n",
|
||
"uni_descr = descriptions.unique()\n",
|
||
"num_uni_descr = len(uni_descr)\n",
|
||
"\n",
|
||
"print(f\"Anzahl Duplikate Vorgangsbeschreibungen: {num_dupl_descr}\")\n",
|
||
"print(f\"Anzahl einzigartiger Vorgangsbeschreibungen: {num_uni_descr}\")\n",
|
||
"print(f\"Anteil einzigartiger Vorgangsbeschreibungen: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 358,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"if not LOAD_CALC_FILES:\n",
|
||
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
|
||
" descr_df = pd.DataFrame(columns=cols)\n",
|
||
" max_val = 0\n",
|
||
" text = None\n",
|
||
" index = 0\n",
|
||
"\n",
|
||
"\n",
|
||
" for idx, description in enumerate(uni_descr):\n",
|
||
" len_descr = len(description)\n",
|
||
" filt = base['VorgangsBeschreibung'] == description\n",
|
||
" temp = base[filt]\n",
|
||
" assoc_obj_ids = temp['ObjektID'].unique()\n",
|
||
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
|
||
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
|
||
" num_dupl = filt.sum()\n",
|
||
" \n",
|
||
" conc_df = pd.DataFrame(data=[[\n",
|
||
" description,\n",
|
||
" len_descr,\n",
|
||
" num_dupl,\n",
|
||
" assoc_obj_ids,\n",
|
||
" num_assoc_obj_ids\n",
|
||
" ]], columns=cols)\n",
|
||
" \n",
|
||
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
|
||
" \n",
|
||
" if num_dupl > max_val:\n",
|
||
" max_val = num_dupl\n",
|
||
" index = idx\n",
|
||
" text = description\n",
|
||
" \n",
|
||
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 359,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>162</th>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>92592</td>\n",
|
||
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
|
||
" <td>206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>Wöchentliche Sichtkontrolle / Reinigung</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>1654</td>\n",
|
||
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
|
||
" <td>18</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>131</th>\n",
|
||
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1616</td>\n",
|
||
" <td>[0, 970, 2134, 2137]</td>\n",
|
||
" <td>4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>160</th>\n",
|
||
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1265</td>\n",
|
||
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
|
||
" <td>11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>140</th>\n",
|
||
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>687</td>\n",
|
||
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
|
||
" <td>166</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2679</th>\n",
|
||
" <td>Zahnräder der Laufkatze verschlissen Ersatztei...</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[415]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2678</th>\n",
|
||
" <td>Bitte 8 Scheiben nach Muster anfertigen. Danke.</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[140]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2677</th>\n",
|
||
" <td>Schalter für Bühne Schwenken abgerissen, bitte...</td>\n",
|
||
" <td>126</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[323]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2676</th>\n",
|
||
" <td>Docke angefahren!</td>\n",
|
||
" <td>17</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[176]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6799</th>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt ...</td>\n",
|
||
" <td>107</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[326]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6800 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"162 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
|
||
"33 Wöchentliche Sichtkontrolle / Reinigung 39 1654 \n",
|
||
"131 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
|
||
"160 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
|
||
"140 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
|
||
"... ... ... ... \n",
|
||
"2679 Zahnräder der Laufkatze verschlissen Ersatztei... 170 1 \n",
|
||
"2678 Bitte 8 Scheiben nach Muster anfertigen. Danke. 48 1 \n",
|
||
"2677 Schalter für Bühne Schwenken abgerissen, bitte... 126 1 \n",
|
||
"2676 Docke angefahren! 17 1 \n",
|
||
"6799 Befestigung Deckel für Batteriefach defekt ... 107 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"162 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
|
||
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
|
||
"131 [0, 970, 2134, 2137] 4 \n",
|
||
"160 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
|
||
"140 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
|
||
"... ... ... \n",
|
||
"2679 [415] 1 \n",
|
||
"2678 [140] 1 \n",
|
||
"2677 [323] 1 \n",
|
||
"2676 [176] 1 \n",
|
||
"6799 [326] 1 \n",
|
||
"\n",
|
||
"[6800 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 359,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 360,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers'"
|
||
]
|
||
},
|
||
"execution_count": 360,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1.iloc[0,0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 361,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Wöchentliche Sichtkontrolle / Reinigung'"
|
||
]
|
||
},
|
||
"execution_count": 361,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1.iloc[1,0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Cosine Similarity**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 362,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# eliminate descriptions with less than 6 symbols\n",
|
||
"subset_data = temp1.loc[temp1['len'] > 5, 'descr'].copy()\n",
|
||
"subset_data = subset_data.iloc[0:100]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"- Wie geht man mit unbekannten Wörtern um?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 363,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# build mapping of embeddings for given model\n",
|
||
"def build_embedding_map(\n",
|
||
" data: Series,\n",
|
||
" model: GermanSpacyModel | SentenceTransformer,\n",
|
||
") -> dict[int, tuple['Embedding',str]]:\n",
|
||
" # dictionary with embeddings\n",
|
||
" embeddings: dict[int, tuple['Embedding',str]] = dict()\n",
|
||
" is_spacy = False\n",
|
||
" is_STRF = False\n",
|
||
" \n",
|
||
" if isinstance(model, spacy.lang.de.German):\n",
|
||
" is_spacy = True\n",
|
||
" elif isinstance(model, SentenceTransformer):\n",
|
||
" is_STRF = True\n",
|
||
" \n",
|
||
" if not any((is_spacy, is_STRF)):\n",
|
||
" raise NotImplementedError(\"Model type unknown\")\n",
|
||
" \n",
|
||
" for (idx, text) in subset_data.items():\n",
|
||
" \n",
|
||
" if is_spacy:\n",
|
||
" embd = model(text)\n",
|
||
" embeddings[idx] = (embd, text)\n",
|
||
" # check for empty vectors\n",
|
||
" if not doc.vector_norm:\n",
|
||
" print('--- Unknown Words ---')\n",
|
||
" print(f'{embd.text=} has no vector')\n",
|
||
" elif is_STRF:\n",
|
||
" embd = model.encode(text, show_progress_bar=False, normalize_embeddings=False)\n",
|
||
" embeddings[idx] = (embd, text)\n",
|
||
" \n",
|
||
" return embeddings, (is_spacy, is_STRF)\n",
|
||
"\n",
|
||
"# build similarity matrix out of embeddings\n",
|
||
"def build_cosSim_matrix(\n",
|
||
" data: Series,\n",
|
||
" model: GermanSpacyModel | SentenceTransformer,\n",
|
||
") -> DataFrame:\n",
|
||
" # build empty matrix\n",
|
||
" df_index = data.index\n",
|
||
" cosineSim_idx_matrix = pd.DataFrame(data=0., columns=df_index, \n",
|
||
" index=df_index, dtype=np.float32)\n",
|
||
" \n",
|
||
" # obtain embeddings based on used model\n",
|
||
" embds, (is_spacy, is_STRF) = build_embedding_map(\n",
|
||
" data=data,\n",
|
||
" model=model\n",
|
||
" )\n",
|
||
" \n",
|
||
" # apply index based mapping for efficient handling of large texts\n",
|
||
" combs = combinations(df_index, 2)\n",
|
||
" \n",
|
||
" for (idx1, idx2) in combs:\n",
|
||
" #print(f\"{idx1=}, {idx2=}\")\n",
|
||
" embd1 = embds[idx1][0]\n",
|
||
" embd2 = embds[idx2][0]\n",
|
||
" \n",
|
||
" # calculate similarity based on model type\n",
|
||
" if is_spacy:\n",
|
||
" cosSim = embd1.similarity(embd2)\n",
|
||
" elif is_STRF:\n",
|
||
" cosSim = sentence_transformers.util.cos_sim(embd1, embd2)\n",
|
||
" cosSim = cosSim.item()\n",
|
||
" \n",
|
||
" cosineSim_idx_matrix.at[idx1, idx2] = cosSim\n",
|
||
" \n",
|
||
" return cosineSim_idx_matrix, embds"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 364,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"cosineSim_idx_matrix, embds = build_cosSim_matrix(\n",
|
||
" data=subset_data,\n",
|
||
" model=model_stfr,\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 365,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>162</th>\n",
|
||
" <th>33</th>\n",
|
||
" <th>131</th>\n",
|
||
" <th>160</th>\n",
|
||
" <th>140</th>\n",
|
||
" <th>1780</th>\n",
|
||
" <th>332</th>\n",
|
||
" <th>104</th>\n",
|
||
" <th>157</th>\n",
|
||
" <th>558</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>180</th>\n",
|
||
" <th>3485</th>\n",
|
||
" <th>2255</th>\n",
|
||
" <th>81</th>\n",
|
||
" <th>360</th>\n",
|
||
" <th>47</th>\n",
|
||
" <th>2951</th>\n",
|
||
" <th>185</th>\n",
|
||
" <th>566</th>\n",
|
||
" <th>40</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>162</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.441387</td>\n",
|
||
" <td>0.409547</td>\n",
|
||
" <td>0.307963</td>\n",
|
||
" <td>0.324018</td>\n",
|
||
" <td>0.506761</td>\n",
|
||
" <td>0.475413</td>\n",
|
||
" <td>0.475614</td>\n",
|
||
" <td>0.491961</td>\n",
|
||
" <td>0.472069</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.306548</td>\n",
|
||
" <td>0.318907</td>\n",
|
||
" <td>0.329199</td>\n",
|
||
" <td>0.296131</td>\n",
|
||
" <td>0.283268</td>\n",
|
||
" <td>0.442444</td>\n",
|
||
" <td>0.129318</td>\n",
|
||
" <td>0.425916</td>\n",
|
||
" <td>0.432691</td>\n",
|
||
" <td>0.356977</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.298110</td>\n",
|
||
" <td>0.372992</td>\n",
|
||
" <td>0.412453</td>\n",
|
||
" <td>0.374439</td>\n",
|
||
" <td>0.423904</td>\n",
|
||
" <td>0.416100</td>\n",
|
||
" <td>0.717584</td>\n",
|
||
" <td>0.422673</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.317514</td>\n",
|
||
" <td>0.321114</td>\n",
|
||
" <td>0.367475</td>\n",
|
||
" <td>0.327464</td>\n",
|
||
" <td>0.228003</td>\n",
|
||
" <td>0.351899</td>\n",
|
||
" <td>0.245888</td>\n",
|
||
" <td>0.383551</td>\n",
|
||
" <td>0.384033</td>\n",
|
||
" <td>0.746593</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>131</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.305271</td>\n",
|
||
" <td>0.390110</td>\n",
|
||
" <td>0.406878</td>\n",
|
||
" <td>0.390903</td>\n",
|
||
" <td>0.417179</td>\n",
|
||
" <td>0.324945</td>\n",
|
||
" <td>0.392856</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.387864</td>\n",
|
||
" <td>0.386872</td>\n",
|
||
" <td>0.466728</td>\n",
|
||
" <td>0.368427</td>\n",
|
||
" <td>0.297099</td>\n",
|
||
" <td>0.393476</td>\n",
|
||
" <td>0.080983</td>\n",
|
||
" <td>0.344004</td>\n",
|
||
" <td>0.346553</td>\n",
|
||
" <td>0.300196</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>160</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.294035</td>\n",
|
||
" <td>0.293377</td>\n",
|
||
" <td>0.457293</td>\n",
|
||
" <td>0.251860</td>\n",
|
||
" <td>0.327785</td>\n",
|
||
" <td>0.456575</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.291295</td>\n",
|
||
" <td>0.356851</td>\n",
|
||
" <td>0.326423</td>\n",
|
||
" <td>0.340315</td>\n",
|
||
" <td>0.241496</td>\n",
|
||
" <td>0.363125</td>\n",
|
||
" <td>0.205827</td>\n",
|
||
" <td>0.350013</td>\n",
|
||
" <td>0.322723</td>\n",
|
||
" <td>0.233216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>140</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.353114</td>\n",
|
||
" <td>0.368328</td>\n",
|
||
" <td>0.319977</td>\n",
|
||
" <td>0.402378</td>\n",
|
||
" <td>0.368687</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.328528</td>\n",
|
||
" <td>0.298065</td>\n",
|
||
" <td>0.515159</td>\n",
|
||
" <td>0.315984</td>\n",
|
||
" <td>0.240238</td>\n",
|
||
" <td>0.406395</td>\n",
|
||
" <td>0.164005</td>\n",
|
||
" <td>0.405763</td>\n",
|
||
" <td>0.403172</td>\n",
|
||
" <td>0.381799</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>47</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.119025</td>\n",
|
||
" <td>0.294950</td>\n",
|
||
" <td>0.281203</td>\n",
|
||
" <td>0.317069</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2951</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.359434</td>\n",
|
||
" <td>0.353695</td>\n",
|
||
" <td>0.223206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>185</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.978342</td>\n",
|
||
" <td>0.411086</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>566</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.404999</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>40</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>100 rows × 100 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" 162 33 131 160 140 1780 332 \\\n",
|
||
"162 0.0 0.441387 0.409547 0.307963 0.324018 0.506761 0.475413 \n",
|
||
"33 0.0 0.000000 0.298110 0.372992 0.412453 0.374439 0.423904 \n",
|
||
"131 0.0 0.000000 0.000000 0.305271 0.390110 0.406878 0.390903 \n",
|
||
"160 0.0 0.000000 0.000000 0.000000 0.294035 0.293377 0.457293 \n",
|
||
"140 0.0 0.000000 0.000000 0.000000 0.000000 0.353114 0.368328 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"47 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"2951 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"185 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"566 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"40 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"\n",
|
||
" 104 157 558 ... 180 3485 2255 \\\n",
|
||
"162 0.475614 0.491961 0.472069 ... 0.306548 0.318907 0.329199 \n",
|
||
"33 0.416100 0.717584 0.422673 ... 0.317514 0.321114 0.367475 \n",
|
||
"131 0.417179 0.324945 0.392856 ... 0.387864 0.386872 0.466728 \n",
|
||
"160 0.251860 0.327785 0.456575 ... 0.291295 0.356851 0.326423 \n",
|
||
"140 0.319977 0.402378 0.368687 ... 0.328528 0.298065 0.515159 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"47 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 \n",
|
||
"2951 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 \n",
|
||
"185 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 \n",
|
||
"566 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 \n",
|
||
"40 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 \n",
|
||
"\n",
|
||
" 81 360 47 2951 185 566 40 \n",
|
||
"162 0.296131 0.283268 0.442444 0.129318 0.425916 0.432691 0.356977 \n",
|
||
"33 0.327464 0.228003 0.351899 0.245888 0.383551 0.384033 0.746593 \n",
|
||
"131 0.368427 0.297099 0.393476 0.080983 0.344004 0.346553 0.300196 \n",
|
||
"160 0.340315 0.241496 0.363125 0.205827 0.350013 0.322723 0.233216 \n",
|
||
"140 0.315984 0.240238 0.406395 0.164005 0.405763 0.403172 0.381799 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"47 0.000000 0.000000 0.000000 0.119025 0.294950 0.281203 0.317069 \n",
|
||
"2951 0.000000 0.000000 0.000000 0.000000 0.359434 0.353695 0.223206 \n",
|
||
"185 0.000000 0.000000 0.000000 0.000000 0.000000 0.978342 0.411086 \n",
|
||
"566 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.404999 \n",
|
||
"40 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"\n",
|
||
"[100 rows x 100 columns]"
|
||
]
|
||
},
|
||
"execution_count": 365,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"cosineSim_idx_matrix"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 366,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# obtain index pairs with cosine similarity \n",
|
||
"# greater than or equal to given threshold value\n",
|
||
"\n",
|
||
"def filt_thresh_cosSim_matrix(\n",
|
||
" threshold: float,\n",
|
||
" cosineSim_idx_matrix: DataFrame,\n",
|
||
"):\n",
|
||
" cosineSim_filt = cosineSim_idx_matrix.where(cosineSim_idx_matrix >= threshold).stack()\n",
|
||
" \n",
|
||
" return cosineSim_filt\n",
|
||
"\n",
|
||
"def list_cosSim_dupl_candidates(\n",
|
||
" cosineSim_filt: Series,\n",
|
||
" embeddings: dict[int, tuple['Embedding',str]],\n",
|
||
"):\n",
|
||
" # compare found duplicates\n",
|
||
" columns = ['idx1', 'text1', 'idx2', 'text2', 'score']\n",
|
||
" df_candidates = pd.DataFrame(columns=columns)\n",
|
||
" \n",
|
||
" index_pairs = list()\n",
|
||
"\n",
|
||
" for ((idx1, idx2), score) in cosineSim_filt.items():\n",
|
||
" # get text content from embedding as second tuple entry\n",
|
||
" content = [[\n",
|
||
" idx1,\n",
|
||
" embeddings[idx1][1],\n",
|
||
" idx2,\n",
|
||
" embeddings[idx2][1],\n",
|
||
" score,\n",
|
||
" ]]\n",
|
||
" df_conc = pd.DataFrame(columns=columns, data=content)\n",
|
||
" \n",
|
||
" df_candidates = pd.concat([df_candidates, df_conc])\n",
|
||
" index_pairs.append((idx1, idx2))\n",
|
||
" \n",
|
||
" return df_candidates, index_pairs\n",
|
||
"\n",
|
||
"def choose_cosSim_dupl_candidates(\n",
|
||
" cosineSim_filt: Series,\n",
|
||
" embeddings: dict[int, tuple['Embedding',str]],\n",
|
||
") -> tuple[DataFrame, list[tuple['Index', 'Index']]]:\n",
|
||
" # compare found duplicates\n",
|
||
" columns = ['idx1', 'text1', 'idx2', 'text2', 'score']\n",
|
||
" df_candidates = pd.DataFrame(columns=columns)\n",
|
||
" \n",
|
||
" index_pairs = list()\n",
|
||
"\n",
|
||
" for ((idx1, idx2), score) in cosineSim_filt.items():\n",
|
||
" # get texts for comparison\n",
|
||
" text1 = embeddings[idx1][1]\n",
|
||
" text2 = embeddings[idx2][1]\n",
|
||
" # get decision\n",
|
||
" print('---------- New Decision ----------')\n",
|
||
" print('text1:\\n', text1, '\\n', flush=True)\n",
|
||
" print('text2:\\n', text2, '\\n', flush=True)\n",
|
||
" decision = input('Please enter >>y<< if this is a duplicate, else hit enter:')\n",
|
||
" \n",
|
||
" if not decision == 'y':\n",
|
||
" continue\n",
|
||
" \n",
|
||
" # get text content from embedding as second tuple entry\n",
|
||
" content = [[\n",
|
||
" idx1,\n",
|
||
" text1,\n",
|
||
" idx2,\n",
|
||
" text2,\n",
|
||
" score,\n",
|
||
" ]]\n",
|
||
" df_conc = pd.DataFrame(columns=columns, data=content)\n",
|
||
" \n",
|
||
" df_candidates = pd.concat([df_candidates, df_conc])\n",
|
||
" index_pairs.append((idx1, idx2))\n",
|
||
" \n",
|
||
" return df_candidates, index_pairs"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 367,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"33 176 0.921449\n",
|
||
" 247 0.903092\n",
|
||
"332 558 0.987194\n",
|
||
"157 247 0.812700\n",
|
||
"176 247 0.816763\n",
|
||
"34 63 0.952310\n",
|
||
"477 247 0.831053\n",
|
||
"111 360 0.991955\n",
|
||
"53 56 0.866648\n",
|
||
" 15 0.871172\n",
|
||
"56 15 0.989507\n",
|
||
"84 191 0.999377\n",
|
||
"28 173 0.836900\n",
|
||
"184 40 0.959962\n",
|
||
"602 255 0.800500\n",
|
||
"29 78 0.939677\n",
|
||
"732 185 0.815442\n",
|
||
"136 174 0.943705\n",
|
||
"680 106 0.889502\n",
|
||
"6580 3371 0.866680\n",
|
||
"185 566 0.978342\n",
|
||
"dtype: float32"
|
||
]
|
||
},
|
||
"execution_count": 367,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"SIMILARITY_THRESHOLD = 0.8\n",
|
||
"\n",
|
||
"cosineSim_filt = filt_thresh_cosSim_matrix(\n",
|
||
" threshold=SIMILARITY_THRESHOLD,\n",
|
||
" cosineSim_idx_matrix=cosineSim_idx_matrix,\n",
|
||
")\n",
|
||
"cosineSim_filt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>idx1</th>\n",
|
||
" <th>text1</th>\n",
|
||
" <th>idx2</th>\n",
|
||
" <th>text2</th>\n",
|
||
" <th>score</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>332</td>\n",
|
||
" <td>Prüfung von: - Scharniere - Dichtung - Schlie...</td>\n",
|
||
" <td>558</td>\n",
|
||
" <td>Monatliche Prüfung von: - Scharniere - Dichtu...</td>\n",
|
||
" <td>0.987194</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>111</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten durch die...</td>\n",
|
||
" <td>360</td>\n",
|
||
" <td>Wöchentliche Interne Wartungstätigkeiten durch...</td>\n",
|
||
" <td>0.991955</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>56</td>\n",
|
||
" <td>Vorgaben aus Brückner Wartungsplan (siehe Extr...</td>\n",
|
||
" <td>15</td>\n",
|
||
" <td>Vorgaben aus Brückner Wartungsplan siehe Extr...</td>\n",
|
||
" <td>0.989507</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>84</td>\n",
|
||
" <td>Vorgabe aus Wartungsplan Firma Menzel (siehe V...</td>\n",
|
||
" <td>191</td>\n",
|
||
" <td>Vorgabe aus Wartungsplan Firma Menzel (siehe V...</td>\n",
|
||
" <td>0.999377</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>185</td>\n",
|
||
" <td>Vorgabe aus Wartungsplan Firma Menzel (siehe V...</td>\n",
|
||
" <td>566</td>\n",
|
||
" <td>Vorgabe aus Wartungsplan Firma Menzel (siehe V...</td>\n",
|
||
" <td>0.978342</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" idx1 text1 idx2 \\\n",
|
||
"0 332 Prüfung von: - Scharniere - Dichtung - Schlie... 558 \n",
|
||
"0 111 Tägliche Interne Wartungstätigkeiten durch die... 360 \n",
|
||
"0 56 Vorgaben aus Brückner Wartungsplan (siehe Extr... 15 \n",
|
||
"0 84 Vorgabe aus Wartungsplan Firma Menzel (siehe V... 191 \n",
|
||
"0 185 Vorgabe aus Wartungsplan Firma Menzel (siehe V... 566 \n",
|
||
"\n",
|
||
" text2 score \n",
|
||
"0 Monatliche Prüfung von: - Scharniere - Dichtu... 0.987194 \n",
|
||
"0 Wöchentliche Interne Wartungstätigkeiten durch... 0.991955 \n",
|
||
"0 Vorgaben aus Brückner Wartungsplan siehe Extr... 0.989507 \n",
|
||
"0 Vorgabe aus Wartungsplan Firma Menzel (siehe V... 0.999377 \n",
|
||
"0 Vorgabe aus Wartungsplan Firma Menzel (siehe V... 0.978342 "
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"cosSim_dupl_candidates, dupl_idx_pairs = list_cosSim_dupl_candidates(\n",
|
||
" cosineSim_filt=cosineSim_filt,\n",
|
||
" embeddings=embds,\n",
|
||
")\n",
|
||
"# save results\n",
|
||
"SAVE_PATH_DUPL_CANDIDATES = (f'./Filterung_Duplikate/dupl_candidates_'\n",
|
||
" f'cosSim_thresh_{SIMILARITY_THRESHOLD}.xlsx')\n",
|
||
"#cosSim_dupl_candidates.to_excel(SAVE_PATH_DUPL_CANDIDATES)\n",
|
||
"cosSim_dupl_candidates"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Nächste Schritte:**\n",
|
||
"- Grenz-Threshold finden, bei dem Duplikate gerade noch richtig erkannt werden"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"thresholds = (0.75, 0.8, 0.85, 0.9, 0.93, 0.95, 0.96, 0.97, 0.98)\n",
|
||
"\n",
|
||
"for thresh in thresholds:\n",
|
||
" \n",
|
||
" cosineSim_filt = filt_thresh_cosSim_matrix(\n",
|
||
" threshold=thresh,\n",
|
||
" cosineSim_idx_matrix=cosineSim_idx_matrix.copy(),\n",
|
||
" )\n",
|
||
" \n",
|
||
" cosSim_dupl_candidates = list_cosSim_dupl_candidates(\n",
|
||
" cosineSim_filt=cosineSim_filt,\n",
|
||
" embeddings=embds,\n",
|
||
" )\n",
|
||
" \n",
|
||
" # saving path\n",
|
||
" saving_path = (f'./Filterung_Duplikate/dupl_candidates_'\n",
|
||
" f'cosSim_thresh_{thresh}_STFR.xlsx')\n",
|
||
" \n",
|
||
" cosSim_dupl_candidates.to_excel(saving_path)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Ergebnisse:**\n",
|
||
"- kein allgemeiner Threshold ableitbar, nur grober Richtwert\n",
|
||
"- Paare mit geringerem Score stellenweise ähnlicher als die mit höherem Score\n",
|
||
"- finale Entscheidung für Duplikat händisch, da Kontextwissen trotzdem notwendig\n",
|
||
"- Arbeit mit ``temp1`` und merging von Einträgen"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 368,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# manually decide if candidates are indeed duplicates\n",
|
||
"\n",
|
||
"SKIP = True\n",
|
||
"if not SKIP:\n",
|
||
" cosSim_dupl_candidates, dupl_idx_pairs = choose_cosSim_dupl_candidates(\n",
|
||
" cosineSim_filt=cosineSim_filt,\n",
|
||
" embeddings=embds,\n",
|
||
" )"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#save_pickle(obj=dupl_idx_pairs, path='./Filterung_Duplikate/dupl_idx_pairs_Exp4.pkl')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 369,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[(33, 176),\n",
|
||
" (332, 558),\n",
|
||
" (34, 63),\n",
|
||
" (53, 56),\n",
|
||
" (53, 15),\n",
|
||
" (56, 15),\n",
|
||
" (84, 191),\n",
|
||
" (29, 78),\n",
|
||
" (136, 174),\n",
|
||
" (680, 106),\n",
|
||
" (185, 566)]"
|
||
]
|
||
},
|
||
"execution_count": 369,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dupl_idx_pairs = load_pickle(path='./Filterung_Duplikate/dupl_idx_pairs_Exp4.pkl')\n",
|
||
"dupl_idx_pairs"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 433,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"temp2 = temp1.copy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 434,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# merge duplicates\n",
|
||
"\n",
|
||
"# to-do:\n",
|
||
"# merge: 'num_occur', 'assoc_obj_ids', \n",
|
||
"# recalc: 'num_assoc_obj_ids'\n",
|
||
"\n",
|
||
"for (i1, i2) in dupl_idx_pairs:\n",
|
||
" \n",
|
||
" # if an entry does not exist anymore, skip this pair\n",
|
||
" if i1 not in temp2.index or i2 not in temp2.index:\n",
|
||
" continue\n",
|
||
" \n",
|
||
" # merge num occur\n",
|
||
" num_occur1 = temp2.at[i1, 'num_occur']\n",
|
||
" num_occur2 = temp2.at[i2, 'num_occur']\n",
|
||
" new_num_occur = num_occur1 + num_occur2\n",
|
||
"\n",
|
||
" # merge assoc obj ids\n",
|
||
" assoc_ids1 = temp2.at[i1, 'assoc_obj_ids']\n",
|
||
" assoc_ids2 = temp2.at[i2, 'assoc_obj_ids']\n",
|
||
" new_assoc_ids = np.append(assoc_ids1, assoc_ids2)\n",
|
||
" new_assoc_ids = np.unique(new_assoc_ids.flatten())\n",
|
||
"\n",
|
||
" # recalc num assoc obj ids\n",
|
||
" new_num_assoc_obj_ids = len(new_assoc_ids)\n",
|
||
"\n",
|
||
" # write porperties to first entry\n",
|
||
" temp2.at[i1, 'num_occur'] = new_num_occur\n",
|
||
" temp2.at[i1, 'assoc_obj_ids'] = new_assoc_ids\n",
|
||
" temp2.at[i1, 'num_assoc_obj_ids'] = new_num_assoc_obj_ids\n",
|
||
" \n",
|
||
" # drop second entry\n",
|
||
" temp2 = temp2.drop(index=i2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 435,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>162</th>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>92592</td>\n",
|
||
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
|
||
" <td>206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>Wöchentliche Sichtkontrolle / Reinigung</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>1654</td>\n",
|
||
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
|
||
" <td>18</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>131</th>\n",
|
||
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1616</td>\n",
|
||
" <td>[0, 970, 2134, 2137]</td>\n",
|
||
" <td>4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>160</th>\n",
|
||
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1265</td>\n",
|
||
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
|
||
" <td>11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>140</th>\n",
|
||
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>687</td>\n",
|
||
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
|
||
" <td>166</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2679</th>\n",
|
||
" <td>Zahnräder der Laufkatze verschlissen Ersatztei...</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[415]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2678</th>\n",
|
||
" <td>Bitte 8 Scheiben nach Muster anfertigen. Danke.</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[140]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2677</th>\n",
|
||
" <td>Schalter für Bühne Schwenken abgerissen, bitte...</td>\n",
|
||
" <td>126</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[323]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2676</th>\n",
|
||
" <td>Docke angefahren!</td>\n",
|
||
" <td>17</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[176]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6799</th>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt ...</td>\n",
|
||
" <td>107</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[326]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6800 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"162 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
|
||
"33 Wöchentliche Sichtkontrolle / Reinigung 39 1654 \n",
|
||
"131 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
|
||
"160 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
|
||
"140 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
|
||
"... ... ... ... \n",
|
||
"2679 Zahnräder der Laufkatze verschlissen Ersatztei... 170 1 \n",
|
||
"2678 Bitte 8 Scheiben nach Muster anfertigen. Danke. 48 1 \n",
|
||
"2677 Schalter für Bühne Schwenken abgerissen, bitte... 126 1 \n",
|
||
"2676 Docke angefahren! 17 1 \n",
|
||
"6799 Befestigung Deckel für Batteriefach defekt ... 107 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"162 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
|
||
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
|
||
"131 [0, 970, 2134, 2137] 4 \n",
|
||
"160 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
|
||
"140 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
|
||
"... ... ... \n",
|
||
"2679 [415] 1 \n",
|
||
"2678 [140] 1 \n",
|
||
"2677 [323] 1 \n",
|
||
"2676 [176] 1 \n",
|
||
"6799 [326] 1 \n",
|
||
"\n",
|
||
"[6800 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 435,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 436,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"temp2['assoc_obj_ids'] = temp2['assoc_obj_ids'].map(lambda x: x.tolist())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 437,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>162</th>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>92592</td>\n",
|
||
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
|
||
" <td>206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>Wöchentliche Sichtkontrolle / Reinigung</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>2015</td>\n",
|
||
" <td>[301, 304, 305, 313, 314, 323, 329, 331, 332, ...</td>\n",
|
||
" <td>23</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>131</th>\n",
|
||
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1616</td>\n",
|
||
" <td>[0, 970, 2134, 2137]</td>\n",
|
||
" <td>4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>160</th>\n",
|
||
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1265</td>\n",
|
||
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
|
||
" <td>11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>140</th>\n",
|
||
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>687</td>\n",
|
||
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
|
||
" <td>166</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2679</th>\n",
|
||
" <td>Zahnräder der Laufkatze verschlissen Ersatztei...</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[415]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2678</th>\n",
|
||
" <td>Bitte 8 Scheiben nach Muster anfertigen. Danke.</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[140]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2677</th>\n",
|
||
" <td>Schalter für Bühne Schwenken abgerissen, bitte...</td>\n",
|
||
" <td>126</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[323]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2676</th>\n",
|
||
" <td>Docke angefahren!</td>\n",
|
||
" <td>17</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[176]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6799</th>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt ...</td>\n",
|
||
" <td>107</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[326]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6790 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"162 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
|
||
"33 Wöchentliche Sichtkontrolle / Reinigung 39 2015 \n",
|
||
"131 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
|
||
"160 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
|
||
"140 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
|
||
"... ... ... ... \n",
|
||
"2679 Zahnräder der Laufkatze verschlissen Ersatztei... 170 1 \n",
|
||
"2678 Bitte 8 Scheiben nach Muster anfertigen. Danke. 48 1 \n",
|
||
"2677 Schalter für Bühne Schwenken abgerissen, bitte... 126 1 \n",
|
||
"2676 Docke angefahren! 17 1 \n",
|
||
"6799 Befestigung Deckel für Batteriefach defekt ... 107 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"162 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
|
||
"33 [301, 304, 305, 313, 314, 323, 329, 331, 332, ... 23 \n",
|
||
"131 [0, 970, 2134, 2137] 4 \n",
|
||
"160 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
|
||
"140 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
|
||
"... ... ... \n",
|
||
"2679 [415] 1 \n",
|
||
"2678 [140] 1 \n",
|
||
"2677 [323] 1 \n",
|
||
"2676 [176] 1 \n",
|
||
"6799 [326] 1 \n",
|
||
"\n",
|
||
"[6790 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 437,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 438,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"LOAD_CALC_FILES = False\n",
|
||
"\n",
|
||
"# save/load dataframe\n",
|
||
"#FILE_PATH = 'VorgangsBeschreibung_analyse_20240306.fth'\n",
|
||
"FILE_PATH = 'VorgangsBeschreibung_analyse_20240306.pkl'\n",
|
||
"if LOAD_CALC_FILES:\n",
|
||
" temp1 = pd.read_feather(FILE_PATH)\n",
|
||
" temp1 = temp1.set_index('index')\n",
|
||
"else:\n",
|
||
" save_df = temp2.copy()\n",
|
||
" #save_df = temp2.reset_index()\n",
|
||
" #save_df.to_feather(FILE_PATH)\n",
|
||
" #save_df.to_parquet(FILE_PATH)\n",
|
||
" save_df.to_pickle(FILE_PATH)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"- Handling von Rechtschreibfehlern (Hunspell über PyEnchant)\n",
|
||
"- Handling von Vector-Embeddings über Transformer-Modelle:\n",
|
||
" - höhere Fehlertoleranz (Rechtschreibung, redundante oder unbedeutende Worte)\n",
|
||
" - nicht angewiesen, dass jedes Wort im Vocabulary vorkommt (vgl. spaCy-Modell)\n",
|
||
" - bei ersten Versuchen höhere Genauigkeit bei der Erkennung tatsächlicher Duplikate\n",
|
||
"- Nutzung Vector-Embeddings für Duplikatfindung"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### ---> Model Training: Data Set"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# data for model training\n",
|
||
"data = temp1.iloc[50:300,0].to_list()\n",
|
||
"data = [e for e in data if e != '']\n",
|
||
"\n",
|
||
"with open('spacy_train/training_data_2.txt','w', encoding='utf-8') as f:\n",
|
||
" f.writelines(\"\\n\".join(data))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 234,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# save/load dataframe\n",
|
||
"FILE_PATH = 'VorgangsBeschreibung_analyse_1.fth'\n",
|
||
"if LOAD_CALC_FILES:\n",
|
||
" temp1 = pd.read_feather(FILE_PATH)\n",
|
||
" temp1 = temp1.set_index('index')\n",
|
||
"else:\n",
|
||
" save_df = temp1.reset_index()\n",
|
||
" save_df.to_feather(FILE_PATH)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### spaCy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 245,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Durchführung: Sollwert: 20 0,1g'"
|
||
]
|
||
},
|
||
"execution_count": 245,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"string = temp1.iloc[-2,0]\n",
|
||
"#string = temp1.iloc[0,0]\n",
|
||
"string"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 246,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"string = 'Ich spiele jeden Tag mit den Kindern im Garten. Das ist schön.'\n",
|
||
"string = 'Die Maschine XYZ ist aufgrund einer Störung im Druckluftsystem defekt.'\n",
|
||
"#string = 'The machine XYZ is broken because of a failure in the air pressure system.'\n",
|
||
"#string = 'Wir benötigen das Werkzeug von Herr Stöppel, um das derzeit abzuarbeiten.Dies wird durch Herrn Strebe getan.'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 247,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"doc = nlp(string)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 248,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# simulate occurence counter\n",
|
||
"OCC_COUNTER = 10"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 249,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"SPELL_CHECK_NON_CHARS = set([' ', '.', ',', ';', ':', '-'])\n",
|
||
"CLEANING = True\n",
|
||
"#CLEANING = False\n",
|
||
"\n",
|
||
"def pre_clean_word(string: str) -> str:\n",
|
||
" \n",
|
||
" pattern = r'[^A-Za-zäöüÄÖÜ]+'\n",
|
||
" string = re.sub(pattern, '', string)\n",
|
||
" \"\"\"\n",
|
||
" for char in SPELL_CHECK_NON_CHARS:\n",
|
||
" string = string.replace(char, '')\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" return string\n",
|
||
"\n",
|
||
"# https://stackoverflow.com/questions/25341945/check-if-string-has-date-any-format \n",
|
||
"def is_str_date(string, fuzzy=False):\n",
|
||
" \n",
|
||
" try:\n",
|
||
" parse(string, fuzzy=fuzzy)\n",
|
||
" return True\n",
|
||
" except ValueError:\n",
|
||
" return False\n",
|
||
"\n",
|
||
"\n",
|
||
"def obtain_sub_tree(token):\n",
|
||
" # check if token is a POS of interest\n",
|
||
" descendants = list(token.subtree)\n",
|
||
" descendants.remove(token)\n",
|
||
" logger.debug(f'Token >>{token}<< has subtree >>{descendants}<<')\n",
|
||
" return descendants\n",
|
||
"\n",
|
||
"\n",
|
||
"def add_children_descendants(\n",
|
||
" parent,\n",
|
||
" weight,\n",
|
||
" connections,\n",
|
||
" unique_tokens,\n",
|
||
" children_sents,\n",
|
||
" map_2_word: dict[str, str] | None = None,\n",
|
||
"):\n",
|
||
" global CLEANING\n",
|
||
" # add child as key\n",
|
||
" if CLEANING:\n",
|
||
" parent_lemma = pre_clean_word(string=parent.lemma_)\n",
|
||
" \n",
|
||
" # map words\n",
|
||
" if word_2_map is not None:\n",
|
||
" if parent_lemma.lower() in map_2_word:\n",
|
||
" parent_lemma = map_2_word[parent_lemma.lower()]\n",
|
||
" #logger.info(f\"[SUCCESS] Mapped PARENT to {parent_lemma}\")\n",
|
||
" \n",
|
||
" if parent_lemma != '':\n",
|
||
" if (parent_lemma, parent.pos_) in connections:\n",
|
||
" connections[(parent_lemma, parent.pos_)].append(children_sents)\n",
|
||
" connections[(parent_lemma, parent.pos_)].append(children_sents)\n",
|
||
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
|
||
" else:\n",
|
||
" # do not add auxiliary words\n",
|
||
" if parent.pos_ != 'AUX':\n",
|
||
" unique_tokens.add(parent_lemma)\n",
|
||
" connections[(parent_lemma, parent.pos_)] = list()\n",
|
||
" connections[(parent_lemma, parent.pos_)].append(children_sents)\n",
|
||
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
|
||
" else:\n",
|
||
" if (parent.lemma_, parent.pos_) in connections:\n",
|
||
" connections[(parent.lemma_, parent.pos_)].append(children_sents)\n",
|
||
" connections[(parent.lemma_, parent.pos_)].append(children_sents)\n",
|
||
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
|
||
" else:\n",
|
||
" # do not add auxiliary words\n",
|
||
" if parent.pos_ != 'AUX':\n",
|
||
" unique_tokens.add(parent.lemma_)\n",
|
||
" connections[(parent.lemma_, parent.pos_)] = list()\n",
|
||
" connections[(parent.lemma_, parent.pos_)].append(children_sents)\n",
|
||
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
|
||
"\n",
|
||
"\n",
|
||
"def obtain_descendant_info(\n",
|
||
" doc,\n",
|
||
" weight,\n",
|
||
" POS_of_interest,\n",
|
||
" TAG_of_interest,\n",
|
||
" connections,\n",
|
||
" unique_tokens,\n",
|
||
" spell_check_candidates,\n",
|
||
" spell_check_whitelist,\n",
|
||
" spell_checker,\n",
|
||
" corrections,\n",
|
||
" map_2_word: dict[str, str] | None = None,\n",
|
||
"):\n",
|
||
" global GENERAL_BLACKLIST\n",
|
||
" global DESC_BLACKLIST\n",
|
||
" global CLEANING\n",
|
||
" \n",
|
||
" # iterate over sentences\n",
|
||
" for sent in doc.sents:\n",
|
||
" # [REWORK] spell check list\n",
|
||
" spell_check_words = list()\n",
|
||
" \n",
|
||
" # iterate over tokens in one sentence\n",
|
||
" for token in sent:\n",
|
||
" \n",
|
||
" if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n",
|
||
" continue\n",
|
||
" elif token.lemma_.lower() in GENERAL_BLACKLIST:\n",
|
||
" logger.debug(f'Eliminated parent >>{token}<< because of blacklist')\n",
|
||
" continue\n",
|
||
" \n",
|
||
" # [REWORK] spell check\n",
|
||
" \"\"\"\n",
|
||
" if token.lemma_.lower() not in spell_check_whitelist:\n",
|
||
" word = pre_clean_word(string=token.lemma_.lower())\n",
|
||
" if word in corrections:\n",
|
||
" word = corrections[word]\n",
|
||
" elif not word.isdigit():\n",
|
||
" spell_check_words.append(word)\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" descendants = obtain_sub_tree(token=token)\n",
|
||
" \n",
|
||
" # iterate over all children if there are any\n",
|
||
" if descendants is not None:\n",
|
||
" # list with all children in the current sentence\n",
|
||
" children_sents = list()\n",
|
||
" \n",
|
||
" for child in descendants:\n",
|
||
" logger.debug(f'Token is >>{token}<< with child >>{child}<< and POS {child.pos_}')\n",
|
||
" \n",
|
||
" # elimnate cases of cross-references with verbs\n",
|
||
" if ((token.pos_ == 'AUX' or token.pos_ == 'VERB') and\n",
|
||
" (child.pos_ == 'AUX' or child.pos_ == 'VERB')):\n",
|
||
" continue\n",
|
||
" elif not (child.pos_ in POS_of_interest or child.tag_ in TAG_of_interest):\n",
|
||
" continue\n",
|
||
" elif child.lemma_.lower() in GENERAL_BLACKLIST:\n",
|
||
" logger.debug(f'Eliminated child >>{child}<< because of blacklist')\n",
|
||
" continue\n",
|
||
" \n",
|
||
" \n",
|
||
" if CLEANING:\n",
|
||
" child = pre_clean_word(string=child.lemma_)\n",
|
||
" if child == '':\n",
|
||
" continue\n",
|
||
" #child = pre_clean_word(string=child)\n",
|
||
" \n",
|
||
" if (child not in DESC_BLACKLIST and\n",
|
||
" not is_str_date(string=child)):\n",
|
||
" #not is_str_date(string=child.text)):\n",
|
||
" #children_sents.append((child.lemma_, weight))\n",
|
||
" \n",
|
||
" # map words\n",
|
||
" if map_2_word is not None:\n",
|
||
" if child.lower() in map_2_word:\n",
|
||
" child = map_2_word[child.lower()]\n",
|
||
" #logger.info(f\"[SUCCESS] Mapped CHILD to {child}\")\n",
|
||
" \n",
|
||
" children_sents.append((child, weight))\n",
|
||
" \n",
|
||
" #if child.lemma_ not in unique_tokens:\n",
|
||
" if child not in unique_tokens:\n",
|
||
" #unique_tokens.add(child.lemma_)\n",
|
||
" unique_tokens.add(child)\n",
|
||
" \n",
|
||
" else:\n",
|
||
" if (child.lemma_ not in DESC_BLACKLIST and\n",
|
||
" not is_str_date(string=child.text)):\n",
|
||
" children_sents.append((child.lemma_, weight))\n",
|
||
" \n",
|
||
" if child.lemma_ not in unique_tokens:\n",
|
||
" unique_tokens.add(child.lemma_)\n",
|
||
" \n",
|
||
" # [REWORK] spell check\n",
|
||
" \"\"\"\n",
|
||
" if child.lemma_.lower() not in spell_check_whitelist:\n",
|
||
" word = pre_clean_word(string=child.lemma_.lower())\n",
|
||
" if word in corrections:\n",
|
||
" word = corrections[word]\n",
|
||
" elif not word.isdigit():\n",
|
||
" spell_check_words.append(word)\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" # add list of children for current parent if not empty\n",
|
||
" if children_sents:\n",
|
||
" \n",
|
||
" add_children_descendants(\n",
|
||
" parent=token,\n",
|
||
" weight=weight,\n",
|
||
" connections=connections,\n",
|
||
" unique_tokens=unique_tokens,\n",
|
||
" children_sents=children_sents,\n",
|
||
" map_2_word=map_2_word,\n",
|
||
" )\n",
|
||
" \n",
|
||
" misspelled_candidates = spell_checker.unknown(spell_check_words)\n",
|
||
" spell_check_candidates.update(misspelled_candidates)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 250,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def obtain_adj_matrix(unique_tokens, connections):\n",
|
||
"\n",
|
||
" adj_mat = pd.DataFrame(\n",
|
||
" data=0, \n",
|
||
" columns=list(unique_tokens), \n",
|
||
" index=list(unique_tokens),\n",
|
||
" dtype=np.uint32,\n",
|
||
" )\n",
|
||
" \n",
|
||
" for (pred, POS), descendants_list in connections.items():\n",
|
||
" #print(f'{pred=}, {descendants=}')\n",
|
||
" \n",
|
||
" for descendants in descendants_list:\n",
|
||
" #print(f'{descendants}')\n",
|
||
" \n",
|
||
" if POS != 'AUX':\n",
|
||
" for (desc, weight) in descendants:\n",
|
||
" adj_mat.at[pred, desc] += weight\n",
|
||
" \n",
|
||
" else:\n",
|
||
" if len(descendants) > 1:\n",
|
||
" # if auxiliary word, make connection between all associated words\n",
|
||
" combs = combinations(descendants, r=2)\n",
|
||
" \n",
|
||
" for comb in combs:\n",
|
||
" # comb is tuple ((word_1, weight), (word_2, weight))\n",
|
||
" weight = comb[0][1]\n",
|
||
" word_1 = comb[0][0]\n",
|
||
" word_2 = comb[1][0]\n",
|
||
" \n",
|
||
" \"\"\"\n",
|
||
" if ((word_1 == 'Eigenverantwortlichkeit' or word_1 == 'neu') and\n",
|
||
" (word_2 == 'Eigenverantwortlichkeit' or word_2 == 'neu')):\n",
|
||
" print(f'Hello from {pred=} with {descendants=}')\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" adj_mat.at[word_1, word_2] += weight\n",
|
||
" \n",
|
||
" return adj_mat\n",
|
||
"\n",
|
||
"\n",
|
||
"def make_undir_adj_matrix(adj_mat):\n",
|
||
" \n",
|
||
" adj_mat_undir = adj_mat.copy()\n",
|
||
" arr = adj_mat_undir.to_numpy()\n",
|
||
" arr_upper = np.triu(arr)\n",
|
||
" arr_lower = np.tril(arr)\n",
|
||
" arr_lower = np.rot90(np.fliplr(arr_lower))\n",
|
||
" arr_new = arr_lower + arr_upper\n",
|
||
" \n",
|
||
" adj_mat_undir.loc[:] = arr_new\n",
|
||
" \n",
|
||
" return adj_mat_undir"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 251,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<span class=\"tex2jax_ignore\"><svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xml:lang=\"de\" id=\"1293a8bd098c40caafc3b29af76d443f-0\" class=\"displacy\" width=\"1800\" height=\"399.5\" direction=\"ltr\" style=\"max-width: none; height: 399.5px; color: #000000; background: #ffffff; font-family: Arial; direction: ltr\">\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"50\">Die</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"50\">DET</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"225\">Maschine</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"225\">NOUN</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"400\">XYZ</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"400\">PROPN</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"575\">ist</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"575\">AUX</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"750\">aufgrund</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"750\">ADP</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"925\">einer</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"925\">DET</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1100\">Störung</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1100\">NOUN</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1275\">im</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1275\">ADP</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1450\">Druckluftsystem</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1450\">NOUN</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
|
||
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1625\">defekt.</tspan>\n",
|
||
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1625\">ADV</tspan>\n",
|
||
"</text>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-0\" stroke-width=\"2px\" d=\"M70,264.5 C70,177.0 215.0,177.0 215.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-0\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M70,266.5 L62,254.5 78,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-1\" stroke-width=\"2px\" d=\"M245,264.5 C245,89.5 570.0,89.5 570.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-1\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">sb</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M245,266.5 L237,254.5 253,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-2\" stroke-width=\"2px\" d=\"M245,264.5 C245,177.0 390.0,177.0 390.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-2\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M390.0,266.5 L398.0,254.5 382.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-3\" stroke-width=\"2px\" d=\"M595,264.5 C595,177.0 740.0,177.0 740.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-3\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">mo</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M740.0,266.5 L748.0,254.5 732.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-4\" stroke-width=\"2px\" d=\"M945,264.5 C945,177.0 1090.0,177.0 1090.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-4\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M945,266.5 L937,254.5 953,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-5\" stroke-width=\"2px\" d=\"M770,264.5 C770,89.5 1095.0,89.5 1095.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-5\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M1095.0,266.5 L1103.0,254.5 1087.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-6\" stroke-width=\"2px\" d=\"M1120,264.5 C1120,177.0 1265.0,177.0 1265.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-6\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">mnr</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M1265.0,266.5 L1273.0,254.5 1257.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-7\" stroke-width=\"2px\" d=\"M1295,264.5 C1295,177.0 1440.0,177.0 1440.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-7\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M1440.0,266.5 L1448.0,254.5 1432.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"\n",
|
||
"<g class=\"displacy-arrow\">\n",
|
||
" <path class=\"displacy-arc\" id=\"arrow-1293a8bd098c40caafc3b29af76d443f-0-8\" stroke-width=\"2px\" d=\"M595,264.5 C595,2.0 1625.0,2.0 1625.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
|
||
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
|
||
" <textPath xlink:href=\"#arrow-1293a8bd098c40caafc3b29af76d443f-0-8\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">pd</textPath>\n",
|
||
" </text>\n",
|
||
" <path class=\"displacy-arrowhead\" d=\"M1625.0,266.5 L1633.0,254.5 1617.0,254.5\" fill=\"currentColor\"/>\n",
|
||
"</g>\n",
|
||
"</svg></span>"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.HTML object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"spacy.displacy.render(doc)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Gesamter Datensatz"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 252,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# analysiere erste 10 Einträge\n",
|
||
"descr = temp1[['descr', 'num_occur']]\n",
|
||
"#descr = descr.iloc[:7,:]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 253,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 254,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6753"
|
||
]
|
||
},
|
||
"execution_count": 254,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(descr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 255,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>161</th>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>92592</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
|
||
" <td>1654</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>130</th>\n",
|
||
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
|
||
" <td>1616</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>159</th>\n",
|
||
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
|
||
" <td>1265</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>139</th>\n",
|
||
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
|
||
" <td>687</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2665</th>\n",
|
||
" <td>Überprüfung der Y-Achse Schneidbrücke am LC 2 ...</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2664</th>\n",
|
||
" <td>Luftschlauch muss ausgetauscht werden. Ist und...</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2663</th>\n",
|
||
" <td>Riemenscheibe tauschen auf 650 UPM</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2660</th>\n",
|
||
" <td>Durchführung: Sollwert: 20 0,1g</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6752</th>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6753 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr num_occur\n",
|
||
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 92592\n",
|
||
"33 Wöchentliche Sichtkontrolle Reinigung 1654\n",
|
||
"130 Tägliche Überprüfung der Ölabscheider 1616\n",
|
||
"159 Wöchentliche Kontrolle der WC-Anlagen 1265\n",
|
||
"139 Halbjährliche Kontrolle des Stabbreithalters 687\n",
|
||
"... ... ...\n",
|
||
"2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 1\n",
|
||
"2664 Luftschlauch muss ausgetauscht werden. Ist und... 1\n",
|
||
"2663 Riemenscheibe tauschen auf 650 UPM 1\n",
|
||
"2660 Durchführung: Sollwert: 20 0,1g 1\n",
|
||
"6752 Befestigung Deckel für Batteriefach defekt Hal... 1\n",
|
||
"\n",
|
||
"[6753 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 255,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"descr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 256,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#LOAD_CALC_FILES = True\n",
|
||
"#LOAD_CALC_FILES = False\n",
|
||
"#IS_TEST = True\n",
|
||
"IS_TEST = False"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 257,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"spell_check_whitelist = {\n",
|
||
" '',\n",
|
||
" 'beschlag',\n",
|
||
" 'brandschutztechnische',\n",
|
||
" 'dichtung',\n",
|
||
" 'festhaltevorrichtung',\n",
|
||
" 'funktion',\n",
|
||
" 'halbjährliche',\n",
|
||
" 'kontrolle',\n",
|
||
" 'maschinenhersteller',\n",
|
||
" 'prüfung',\n",
|
||
" 'reinigung',\n",
|
||
" 'scharnier',\n",
|
||
" 'schließvorrichtung',\n",
|
||
" 'schmierung',\n",
|
||
" 'sichtkontrolle',\n",
|
||
" 'stabbreithalter',\n",
|
||
" 'technikrundgang',\n",
|
||
" 'vorgabe',\n",
|
||
" 'wartungstätigkeit',\n",
|
||
" 'wcanlage',\n",
|
||
" 'ölabscheider',\n",
|
||
" 'abarbeiten',\n",
|
||
" 'abgleichen',\n",
|
||
" 'abschmieren',\n",
|
||
" 'abschmierung',\n",
|
||
" 'abteilungsleiter',\n",
|
||
" 'akku',\n",
|
||
" 'analyse',\n",
|
||
" 'arbeitsplan',\n",
|
||
" 'aschenbecher',\n",
|
||
" 'auffüllen',\n",
|
||
" 'auflistung',\n",
|
||
" 'befestigungsschraube',\n",
|
||
" 'beschädigung',\n",
|
||
" 'betriebsstunde',\n",
|
||
" 'blombe',\n",
|
||
" 'blombieren',\n",
|
||
" 'brückner',\n",
|
||
" 'campenabwickler',\n",
|
||
" 'campenaufwickler',\n",
|
||
" 'desinfektionsmittel',\n",
|
||
" 'dichtigkeit',\n",
|
||
" 'druckkontrolle',\n",
|
||
" 'efficiosystem',\n",
|
||
" 'eigenverantwortlichkeit',\n",
|
||
" 'einrichtung',\n",
|
||
" 'email',\n",
|
||
" 'erledigungsdatum',\n",
|
||
" 'extradate',\n",
|
||
" 'extradatum',\n",
|
||
" 'filter',\n",
|
||
" 'firma',\n",
|
||
" 'formplatte',\n",
|
||
" 'frostprävention',\n",
|
||
" 'gegendruckbolze',\n",
|
||
" 'gesamtanlage',\n",
|
||
" 'heizungsanlage',\n",
|
||
" 'keller',\n",
|
||
" 'kesselhauskontrolle',\n",
|
||
" 'kesselwasser',\n",
|
||
" 'koffer',\n",
|
||
" 'kompensator',\n",
|
||
" 'kompressorstation',\n",
|
||
" 'kondensat',\n",
|
||
" 'kühlturm',\n",
|
||
" 'kühltürme',\n",
|
||
" 'lager',\n",
|
||
" 'laserabteilung',\n",
|
||
" 'leckage',\n",
|
||
" 'leerung',\n",
|
||
" 'leiterprüfung',\n",
|
||
" 'linearkugellager',\n",
|
||
" 'luftdruckkontrolle',\n",
|
||
" 'magazin',\n",
|
||
" 'maschinenbediener',\n",
|
||
" 'messwert',\n",
|
||
" 'monat',\n",
|
||
" 'motor',\n",
|
||
" 'papiermüllbehälter',\n",
|
||
" 'personalbüro',\n",
|
||
" 'pflasterschrank',\n",
|
||
" 'rieme',\n",
|
||
" 'rollenkette',\n",
|
||
" 'rundgang',\n",
|
||
" 'schweißkopf',\n",
|
||
" 'schweisskopf',\n",
|
||
" 'sichtprüfung',\n",
|
||
" 'speisewasser',\n",
|
||
" 'sprinkleranlage',\n",
|
||
" 'temperatursensor',\n",
|
||
" 'terminieren',\n",
|
||
" 'ticket',\n",
|
||
" 'trommel',\n",
|
||
" 'täglicher',\n",
|
||
" 'uvröhre',\n",
|
||
" 'ventilator',\n",
|
||
" 'verbandsmaterial',\n",
|
||
" 'verschleiß',\n",
|
||
" 'verschleiss',\n",
|
||
" 'vorbelegung',\n",
|
||
" 'wartung',\n",
|
||
" 'wartungsarbeit',\n",
|
||
" 'wartungsplan',\n",
|
||
" 'wasseraufbereitung',\n",
|
||
" 'wasseraufbereitungsanlage',\n",
|
||
" 'wasserverbrauch',\n",
|
||
" 'weberei',\n",
|
||
" 'wumagtrockner',\n",
|
||
" 'wäscherkontrolle',\n",
|
||
" 'wöchig',\n",
|
||
" 'abdichten',\n",
|
||
" 'abfluprüfung',\n",
|
||
" 'ablesen',\n",
|
||
" 'abluftkanal',\n",
|
||
" 'absauganlage',\n",
|
||
" 'abspeichern',\n",
|
||
" 'absprache',\n",
|
||
" 'aktivkohlepatron',\n",
|
||
" 'aktivkohlepatrone',\n",
|
||
" 'anbackung',\n",
|
||
" 'anfragen',\n",
|
||
" 'angebot',\n",
|
||
" 'anpresswalze',\n",
|
||
" 'ansaug',\n",
|
||
" 'anschluss',\n",
|
||
" 'anschluß',\n",
|
||
" 'anzahl',\n",
|
||
" 'auen',\n",
|
||
" 'auenbereich',\n",
|
||
" 'aueneinheit',\n",
|
||
" 'aufwickler',\n",
|
||
" 'ausblasöffnung',\n",
|
||
" 'ausbrennen',\n",
|
||
" 'auslassventil',\n",
|
||
" 'ausrüstung',\n",
|
||
" 'austausch',\n",
|
||
" 'axialpendelrollenlager',\n",
|
||
" 'batteriewechsel',\n",
|
||
" 'batterieüberprüfung',\n",
|
||
" 'baugruppe',\n",
|
||
" 'baumwolltuch',\n",
|
||
" 'bauteil',\n",
|
||
" 'befeuchter',\n",
|
||
" 'beleuchtung',\n",
|
||
" 'beschichtunglegierung',\n",
|
||
" 'besprechungszimmer',\n",
|
||
" 'bestandskontrolle',\n",
|
||
" 'bestellformular',\n",
|
||
" 'bestätigung',\n",
|
||
" 'bezeichnung',\n",
|
||
" 'binder',\n",
|
||
" 'blutstop',\n",
|
||
" 'bolze',\n",
|
||
" 'breitstreckwalze',\n",
|
||
" 'containerstellfläche',\n",
|
||
" 'contrawalze',\n",
|
||
" 'dachfläche',\n",
|
||
" 'dampfzylinder',\n",
|
||
" 'deformierung',\n",
|
||
" 'dezember',\n",
|
||
" 'din',\n",
|
||
" 'docke',\n",
|
||
" 'dokumentation',\n",
|
||
" 'dosierpumpe',\n",
|
||
" 'druckluftbehälter',\n",
|
||
" 'druckluftleitung',\n",
|
||
" 'druckluftschläuche',\n",
|
||
" 'drucktestkontrolle',\n",
|
||
" 'einterminieren',\n",
|
||
" 'eintragung',\n",
|
||
" 'einzelprotokoll',\n",
|
||
" 'einziehwalze',\n",
|
||
" 'elektisch',\n",
|
||
" 'element',\n",
|
||
" 'enthärtung',\n",
|
||
" 'entwässern',\n",
|
||
" 'erledigungsbeschreibeung',\n",
|
||
" 'erstehilfeeinrichtung',\n",
|
||
" 'erweiterung',\n",
|
||
" 'explosionsschutzanlage',\n",
|
||
" 'extradaten',\n",
|
||
" 'exzenterringbefestigung',\n",
|
||
" 'fa',\n",
|
||
" 'fach',\n",
|
||
" 'faltenbalge',\n",
|
||
" 'feedbackinput',\n",
|
||
" 'feuerwehrumfahrung',\n",
|
||
" 'filert',\n",
|
||
" 'filteranlage',\n",
|
||
" 'filterelement',\n",
|
||
" 'filterstufe',\n",
|
||
" 'fixtermin',\n",
|
||
" 'flanschlager',\n",
|
||
" 'flanschlagerquadrat',\n",
|
||
" 'fluchtwegsymbol',\n",
|
||
" 'flusenabsaugrohr',\n",
|
||
" 'freilauf',\n",
|
||
" 'fremdkörper',\n",
|
||
" 'führungswagen',\n",
|
||
" 'gaslager',\n",
|
||
" 'gaszählerstand',\n",
|
||
" 'gatter',\n",
|
||
" 'geräteinner',\n",
|
||
" 'geräteinneres',\n",
|
||
" 'geräusch',\n",
|
||
" 'gesamt',\n",
|
||
" 'gesamterzeugt',\n",
|
||
" 'getränkeautomat',\n",
|
||
" 'gewindebefestigung',\n",
|
||
" 'gewindestiftbefestigung',\n",
|
||
" 'gleitschiene',\n",
|
||
" 'grat',\n",
|
||
" 'gro',\n",
|
||
" 'grundplatte',\n",
|
||
" 'halle',\n",
|
||
" 'haupteingang',\n",
|
||
" 'hebebühne',\n",
|
||
" 'hebezeug',\n",
|
||
" 'helm',\n",
|
||
" 'hersteller',\n",
|
||
" 'hochregal',\n",
|
||
" 'hochtemperatur',\n",
|
||
" 'hochtemperatureinsatz',\n",
|
||
" 'hydraulik',\n",
|
||
" 'hydrauliköl',\n",
|
||
" 'impulseingang',\n",
|
||
" 'indikator',\n",
|
||
" 'inneneinheit',\n",
|
||
" 'insektenvernichter',\n",
|
||
" 'kabel',\n",
|
||
" 'kammer',\n",
|
||
" 'karton',\n",
|
||
" 'kegelradgetriebe',\n",
|
||
" 'kegelradgetriebemotor',\n",
|
||
" 'kette',\n",
|
||
" 'klemmrolle',\n",
|
||
" 'klimaanlage',\n",
|
||
" 'klimabühne',\n",
|
||
" 'klimagerät',\n",
|
||
" 'kompressor',\n",
|
||
" 'kompressorluftwert',\n",
|
||
" 'kontoll',\n",
|
||
" 'kontrawalze',\n",
|
||
" 'kontroll',\n",
|
||
" 'krankheit',\n",
|
||
" 'krän',\n",
|
||
" 'kräne',\n",
|
||
" 'kuehlaggregat',\n",
|
||
" 'kw',\n",
|
||
" 'kühlgerät',\n",
|
||
" 'lagereinheit',\n",
|
||
" 'lagereinsatz',\n",
|
||
" 'lagerort',\n",
|
||
" 'lagerung',\n",
|
||
" 'laser',\n",
|
||
" 'laufgeräusche',\n",
|
||
" 'luftansaugseite',\n",
|
||
" 'luftfilter',\n",
|
||
" 'luftfilterwasserabscheider',\n",
|
||
" 'luftmenge',\n",
|
||
" 'luftreiniger',\n",
|
||
" 'lösungsmittel',\n",
|
||
" 'lüftungsanlage',\n",
|
||
" 'macke',\n",
|
||
" 'managementsystem',\n",
|
||
" 'maschinenanschluss',\n",
|
||
" 'materialzersetzung',\n",
|
||
" 'messlager',\n",
|
||
" 'micron',\n",
|
||
" 'mischer',\n",
|
||
" 'monatlicher',\n",
|
||
" 'monatliches',\n",
|
||
" 'monteur',\n",
|
||
" 'moos',\n",
|
||
" 'motorstart',\n",
|
||
" 'nachfetten',\n",
|
||
" 'nachschmieren',\n",
|
||
" 'nachspann',\n",
|
||
" 'neuvertrag',\n",
|
||
" 'nord',\n",
|
||
" 'nottelefon',\n",
|
||
" 'nr',\n",
|
||
" 'oberer',\n",
|
||
" 'oberflächenkontrolle',\n",
|
||
" 'objektkarte',\n",
|
||
" 'palette',\n",
|
||
" 'pendelkugellager',\n",
|
||
" 'pfeifer',\n",
|
||
" 'platine',\n",
|
||
" 'pneum',\n",
|
||
" 'pneumatikventil',\n",
|
||
" 'pneumatisch',\n",
|
||
" 'pos',\n",
|
||
" 'positioniersystem',\n",
|
||
" 'prozesskennzahl',\n",
|
||
" 'prüfbericht',\n",
|
||
" 'prüfplan',\n",
|
||
" 'rampenbereich',\n",
|
||
" 'rauwalze',\n",
|
||
" 'regalprüfer',\n",
|
||
" 'regalsicherungsanlage',\n",
|
||
" 'reiniger',\n",
|
||
" 'reinigungstuch',\n",
|
||
" 'restlich',\n",
|
||
" 'risikoersatzteil',\n",
|
||
" 'rohrtrenner',\n",
|
||
" 'roller',\n",
|
||
" 'rundgangkontrollen',\n",
|
||
" 'rückmeldung',\n",
|
||
" 'sae',\n",
|
||
" 'sauberkeit',\n",
|
||
" 'schlitten',\n",
|
||
" 'schmierstoff',\n",
|
||
" 'schmierstoffmenge',\n",
|
||
" 'schneider',\n",
|
||
" 'schraube',\n",
|
||
" 'schraubenbestand',\n",
|
||
" 'schutzabdeckung',\n",
|
||
" 'sicherheitsbeleuchtung',\n",
|
||
" 'sicherheitseinrichtung',\n",
|
||
" 'sicherheitslichtschranke',\n",
|
||
" 'sicherheitsweste',\n",
|
||
" 'sicherstellung',\n",
|
||
" 'sonotrode',\n",
|
||
" 'sonotrodenständer',\n",
|
||
" 'spannkopflager',\n",
|
||
" 'spannlager',\n",
|
||
" 'spannrahmen',\n",
|
||
" 'spindel',\n",
|
||
" 'spindelhubgetriebe',\n",
|
||
" 'spindelmutter',\n",
|
||
" 'spülzeitprüfung',\n",
|
||
" 'stab',\n",
|
||
" 'stadtwasser',\n",
|
||
" 'stehlager',\n",
|
||
" 'stehlagergehäuse',\n",
|
||
" 'steuerung',\n",
|
||
" 'stückliste',\n",
|
||
" 'systemumstellung',\n",
|
||
" 'telefonanlage',\n",
|
||
" 'telefonat',\n",
|
||
" 'termin',\n",
|
||
" 'terminabsprache',\n",
|
||
" 'terminiern',\n",
|
||
" 'terminiert',\n",
|
||
" 'terminierung',\n",
|
||
" 'terminvorschlag',\n",
|
||
" 'testomat',\n",
|
||
" 'thermoheizelement',\n",
|
||
" 'torsprechanlage',\n",
|
||
" 'trinkwassernetz',\n",
|
||
" 'trockenzylinder',\n",
|
||
" 'tänzerrolle',\n",
|
||
" 'türdichtung',\n",
|
||
" 'türgriff',\n",
|
||
" 'türsicherung',\n",
|
||
" 'umlenkwalzen',\n",
|
||
" 'umrandung',\n",
|
||
" 'unkraut',\n",
|
||
" 'uschienenführung',\n",
|
||
" 'uvv',\n",
|
||
" 'ventil',\n",
|
||
" 'verbaut',\n",
|
||
" 'verbrennungsset',\n",
|
||
" 'vereinbarung',\n",
|
||
" 'verkalkung',\n",
|
||
" 'verschleiteileinsatz',\n",
|
||
" 'verschmutzung',\n",
|
||
" 'verschmutzungenlos',\n",
|
||
" 'verstellung',\n",
|
||
" 'verunreinigung',\n",
|
||
" 'vollständigkeit',\n",
|
||
" 'volumenzähler',\n",
|
||
" 'vorderer',\n",
|
||
" 'vordruck',\n",
|
||
" 'vorfilter',\n",
|
||
" 'vorfilterflie',\n",
|
||
" 'vorliegen',\n",
|
||
" 'vormonat',\n",
|
||
" 'wartungsintervall',\n",
|
||
" 'wartungsvertrag',\n",
|
||
" 'wasserfilter',\n",
|
||
" 'wasserhärte',\n",
|
||
" 'wasserpegelkontrolle',\n",
|
||
" 'wasserzählerstand',\n",
|
||
" 'wechselintervall',\n",
|
||
" 'wärmetauscher',\n",
|
||
" 'zahnrieme',\n",
|
||
" 'zahnstange',\n",
|
||
" 'zuleitung',\n",
|
||
" 'zuschicken',\n",
|
||
" 'ölfüllung',\n",
|
||
" 'ölstand',\n",
|
||
" 'ölstandsichtprüfung',\n",
|
||
" 'ölstandskontrolle',\n",
|
||
" 'überziehen'\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 258,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"corrections: dict[str, str] = {\n",
|
||
" 'desifektionsmittel': 'desinfektionsmittel',\n",
|
||
" 'schweikopf': 'schweisskopf',\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Entdeckte Gruppen**\n",
|
||
"- Prüfung:\n",
|
||
" - Prüfen\n",
|
||
" - Sichtprüfung\n",
|
||
" - Überprüfung / überprüfen\n",
|
||
" - Kontrolle / kontrollieren\n",
|
||
" - sicherstellen / Sicherstellung\n",
|
||
" - Wartung / warten\n",
|
||
" - Reinigung / reinigen\n",
|
||
" - Prüfbericht\n",
|
||
"- Handlung:\n",
|
||
" - Schmierung\n",
|
||
" - schmieren\n",
|
||
" - reinigen\n",
|
||
" - Reinigung\n",
|
||
" - schneiden / nachschneiden\n",
|
||
"- zyklisch:\n",
|
||
" - täglich\n",
|
||
" - wöchentlich\n",
|
||
" - monatlich\n",
|
||
" - jährlich\n",
|
||
"- Datum:\n",
|
||
" - Uhr\n",
|
||
" - Montag, Dienstag, Mittwoch, Donnerstag, Freitag, Samstag, Sonntag\n",
|
||
"- Kleinteile:\n",
|
||
" - Schraube\n",
|
||
" - Adapter\n",
|
||
" - Halterung\n",
|
||
" - Scheibe\n",
|
||
" - Gewinde\n",
|
||
" - Ventil\n",
|
||
" - Schalter\n",
|
||
" - Befestigungsschraube\n",
|
||
"- Komponenten:\n",
|
||
" - Kupplung\n",
|
||
" - Motor\n",
|
||
" - Getriebe\n",
|
||
" - Ventilator\n",
|
||
" - Zahnriemen\n",
|
||
" - Tranformator\n",
|
||
" - Filterelement\n",
|
||
" - Dosierpumpe\n",
|
||
" - Luftschlauch\n",
|
||
" - Dichtung\n",
|
||
" - Filter\n",
|
||
" - Scharnier\n",
|
||
" - Spannrolle\n",
|
||
" - Druckluftbehälter\n",
|
||
" - Kette\n",
|
||
" - Anschlüsse\n",
|
||
" - Schläuche\n",
|
||
" - Beleuchtung\n",
|
||
"- Elektrik:\n",
|
||
" - Zuleitung\n",
|
||
" - Kabel\n",
|
||
" - Steckdose\n",
|
||
" - Elektriker\n",
|
||
" - Elektronik\n",
|
||
" - elektrisch\n",
|
||
" - Sicherheitsbeleuchtung\n",
|
||
"- Anlagen:\n",
|
||
" - Mischanlage\n",
|
||
" - Maschine\n",
|
||
" - Wasserenthärtungsanlage\n",
|
||
" - Lüftungsanlage\n",
|
||
" - Klimaanlage\n",
|
||
"- Vereinbarung:\n",
|
||
" - Wartungsvertrag\n",
|
||
" - Neuvertrag\n",
|
||
" - Vertrag\n",
|
||
" - terminieren / terminiert\n",
|
||
" - Absprache\n",
|
||
" - melden\n",
|
||
" - telefonisch\n",
|
||
" - mitteilen\n",
|
||
"- Störbild:\n",
|
||
" - defekt\n",
|
||
" - kaputt\n",
|
||
" - Geräusch\n",
|
||
" - undicht\n",
|
||
" - leckt\n",
|
||
" - Dichtigkeit\n",
|
||
"- Abteilung:\n",
|
||
" - Buchhaltung\n",
|
||
" - Betriebstechnik\n",
|
||
" - Entwicklung\n",
|
||
"- Ort:\n",
|
||
" - Kesselhaus\n",
|
||
" - Durchfahrt\n",
|
||
" - Dach\n",
|
||
" - Haupteingang\n",
|
||
" - Werkbank\n",
|
||
" - Schlosserei"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 272,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"word_2_map = {\n",
|
||
" 'Prüfung': ['prüfen', 'sichtprüfung', 'überprüfung', 'überprüfen',\n",
|
||
" 'kontrolle', 'kontrollieren', 'sicherstellen', 'sicherstellung',\n",
|
||
" 'reinigung', 'reinigen', 'prüfbericht', 'sichtkontrolle',\n",
|
||
" 'rundgang', 'technikrundgang'],\n",
|
||
" 'Wartung': ['wartung', 'warten', 'wartungstätigkeit', 'wartungsarbeit',\n",
|
||
" 'wartungsplan'],\n",
|
||
" 'Handlung': ['schmierung', 'schmieren', 'reinigen', 'reinigung',\n",
|
||
" 'schneiden', 'nachschneiden'],\n",
|
||
" 'zyklisch': ['täglich', 'tägliche', 'täglicher', 'wöchentlich', 'wöchentliche', 'monatlich', 'jährlich',\n",
|
||
" 'halbjährlich', 'monatliche', 'wartungsintervall'],\n",
|
||
" 'Datum': ['uhr', 'montag', 'dienstag', 'mittwoch', 'donnerstag',\n",
|
||
" 'freitag', 'samstag', 'sonntag'],\n",
|
||
" 'Kleinteile': ['schraube', 'adapter', 'halterung', 'scheibe', 'gewinde',\n",
|
||
" 'ventil', 'schalter', 'befestigungsschraube'],\n",
|
||
" 'Komponenten': ['kupplung', 'motor', 'getriebe', 'ventilator',\n",
|
||
" 'zahnriemen', 'transformator', 'filterelement',\n",
|
||
" 'dosierpumpe', 'luftschlauch', 'dichtung', 'filter',\n",
|
||
" 'scharnier', 'spannrolle', 'druckluftbehälter', 'kette',\n",
|
||
" 'anschlüsse', 'anschluss', 'schläuche', 'schlauch', 'beleuchtung'],\n",
|
||
" 'Elektrik': ['zuleitung', 'kabel', 'steckdose', 'elektriker',\n",
|
||
" 'elektronik', 'elektrisch', 'sicherheitsbeleuchtung'],\n",
|
||
" 'Anlagen': ['anlage', 'mischanlage', 'maschine', 'klimaanlage', 'filteranlage',\n",
|
||
" 'wasserenthärtungsanlage', 'lüftungsanlage', 'wasseraufbereitungsanlage'],\n",
|
||
" 'Vereinbarung': ['wartungsvertrag', 'neuvertrag', 'vertrag', 'terminieren'\n",
|
||
" 'terminiert', 'absprache', 'melden', 'telefonisch', 'mitteilen'],\n",
|
||
" 'Störbild': ['defekt', 'kaputt', 'geräusch', 'undicht', 'leckt', 'dichtigkeit'],\n",
|
||
" 'Abteilung': ['buchhaltung', 'betriebstechnik', 'entwicklung'],\n",
|
||
" 'Ort': ['kesselhaus', 'durchfahrt', 'dach', \n",
|
||
" 'haupteingang', 'werkbank', 'schlosserei'],\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"- Frage: Existiert Möglichkeit zur Klassifizierung von Begriffen?\n",
|
||
" - z.B. automatische Kennung, ob Komponente oder nicht"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 273,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"map_2_word = dict()\n",
|
||
"\n",
|
||
"for key, word_list in word_2_map.items():\n",
|
||
" \n",
|
||
" for word in word_list:\n",
|
||
" map_2_word[word] = key"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 274,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:base:Number of entries processed: 1, Percent completed: 0.01\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:base:Number of entries processed: 501, Percent completed: 7.42\n",
|
||
"INFO:base:Number of entries processed: 1001, Percent completed: 14.82\n",
|
||
"INFO:base:Number of entries processed: 1501, Percent completed: 22.23\n",
|
||
"INFO:base:Number of entries processed: 2001, Percent completed: 29.63\n",
|
||
"INFO:base:Number of entries processed: 2501, Percent completed: 37.04\n",
|
||
"INFO:base:Number of entries processed: 3001, Percent completed: 44.44\n",
|
||
"INFO:base:Number of entries processed: 3501, Percent completed: 51.84\n",
|
||
"INFO:base:Number of entries processed: 4001, Percent completed: 59.25\n",
|
||
"INFO:base:Number of entries processed: 4501, Percent completed: 66.65\n",
|
||
"INFO:base:Number of entries processed: 5001, Percent completed: 74.06\n",
|
||
"INFO:base:Number of entries processed: 5501, Percent completed: 81.46\n",
|
||
"INFO:base:Number of entries processed: 6001, Percent completed: 88.86\n",
|
||
"INFO:base:Number of entries processed: 6501, Percent completed: 96.27\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# adjacency matrix\n",
|
||
"connections = dict()\n",
|
||
"unique_tokens = set()\n",
|
||
"UPDATE_STATUS = 500\n",
|
||
"length_data = len(descr)\n",
|
||
"spell_check_candidates = set()\n",
|
||
"spell_checker = SpellChecker(language='de', distance=1)\n",
|
||
"\n",
|
||
"if not LOAD_CALC_FILES or IS_TEST:\n",
|
||
" for count, description in enumerate(descr.iterrows()):\n",
|
||
" \n",
|
||
" text = description[1]['descr']\n",
|
||
" weight = description[1]['num_occur']\n",
|
||
" \n",
|
||
" doc = nlp(text)\n",
|
||
" \n",
|
||
" obtain_descendant_info(\n",
|
||
" doc=doc,\n",
|
||
" weight=weight,\n",
|
||
" POS_of_interest=POS_of_interest,\n",
|
||
" TAG_of_interest=TAG_of_interest,\n",
|
||
" connections=connections,\n",
|
||
" unique_tokens=unique_tokens,\n",
|
||
" spell_check_candidates=spell_check_candidates,\n",
|
||
" spell_check_whitelist=spell_check_whitelist,\n",
|
||
" spell_checker=spell_checker,\n",
|
||
" corrections=corrections,\n",
|
||
" map_2_word=map_2_word,\n",
|
||
" )\n",
|
||
" \n",
|
||
" if count % UPDATE_STATUS == 0:\n",
|
||
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 275,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_DF_PATH = './Graphanalyse/adj_mat_df.fth'\n",
|
||
"if not IS_TEST:\n",
|
||
" if LOAD_CALC_FILES:\n",
|
||
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
|
||
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
|
||
" # additional information\n",
|
||
" connections = load_pickle('connections.pkl')\n",
|
||
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
|
||
" else:\n",
|
||
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
|
||
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
|
||
" save_df = adj_mat_undir.reset_index()\n",
|
||
" save_df.to_feather(ADJ_DF_PATH)\n",
|
||
" # additional information\n",
|
||
" save_pickle(obj=connections, path='connections.pkl')\n",
|
||
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 276,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Klübertemp</th>\n",
|
||
" <th>Schusssuche</th>\n",
|
||
" <th>Laser</th>\n",
|
||
" <th>Schaftteile</th>\n",
|
||
" <th>Dichtsätz</th>\n",
|
||
" <th>Tastatur</th>\n",
|
||
" <th>Vorspuleinheit</th>\n",
|
||
" <th>beginnen</th>\n",
|
||
" <th>auslesen</th>\n",
|
||
" <th>Kettspannung</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Tänzerwalze</th>\n",
|
||
" <th>Abfallkante</th>\n",
|
||
" <th>rappeln</th>\n",
|
||
" <th>Rottenegger</th>\n",
|
||
" <th>Contrawalze</th>\n",
|
||
" <th>Eisenträger</th>\n",
|
||
" <th>Hängegurte</th>\n",
|
||
" <th>Treffen</th>\n",
|
||
" <th>Greiferarmen</th>\n",
|
||
" <th>Nadelleist</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>A</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ACHTUNG</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ACServomotor</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AForm</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AIB</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überziech</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überziehen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überzogen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>üblich</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>üperprüfen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6845 rows × 6845 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Klübertemp Schusssuche Laser Schaftteile Dichtsätz \\\n",
|
||
"A 0 0 0 0 0 \n",
|
||
"ACHTUNG 0 0 0 0 0 \n",
|
||
"ACServomotor 0 0 0 0 0 \n",
|
||
"AForm 0 0 0 0 0 \n",
|
||
"AIB 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"überziech 0 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 0 \n",
|
||
"überzogen 0 0 0 0 0 \n",
|
||
"üblich 0 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Tastatur Vorspuleinheit beginnen auslesen Kettspannung ... \\\n",
|
||
"A 0 0 0 0 0 ... \n",
|
||
"ACHTUNG 0 0 0 0 0 ... \n",
|
||
"ACServomotor 0 0 0 0 0 ... \n",
|
||
"AForm 0 0 0 0 0 ... \n",
|
||
"AIB 0 0 0 0 0 ... \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"überziech 0 0 0 0 0 ... \n",
|
||
"überziehen 0 0 0 0 0 ... \n",
|
||
"überzogen 0 0 0 0 0 ... \n",
|
||
"üblich 0 0 0 0 0 ... \n",
|
||
"üperprüfen 0 0 0 0 0 ... \n",
|
||
"\n",
|
||
" Tänzerwalze Abfallkante rappeln Rottenegger Contrawalze \\\n",
|
||
"A 0 0 0 0 0 \n",
|
||
"ACHTUNG 0 0 0 0 0 \n",
|
||
"ACServomotor 0 0 0 0 0 \n",
|
||
"AForm 0 0 0 0 0 \n",
|
||
"AIB 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"überziech 0 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 17 \n",
|
||
"überzogen 0 0 0 0 6 \n",
|
||
"üblich 0 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Eisenträger Hängegurte Treffen Greiferarmen Nadelleist \n",
|
||
"A 0 0 0 0 0 \n",
|
||
"ACHTUNG 0 0 0 0 0 \n",
|
||
"ACServomotor 0 0 0 0 0 \n",
|
||
"AForm 0 0 0 0 0 \n",
|
||
"AIB 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"überziech 0 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 0 \n",
|
||
"überzogen 0 0 0 0 0 \n",
|
||
"üblich 0 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 0 \n",
|
||
"\n",
|
||
"[6845 rows x 6845 columns]"
|
||
]
|
||
},
|
||
"execution_count": 276,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"adj_mat_undir.sort_index()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Test Cosine Similarity\n",
|
||
"- erstelle Matrix mit Ähnlichkeits-Score (obere Dreiecksmatrix)\n",
|
||
"- jedes Wortpaar\n",
|
||
"- filtere Tabelle nach Threshold\n",
|
||
"- nutze Gewichts-Adjezenzmatrix mit Threshold als Maske\n",
|
||
" - nur Analyse von hochgewichtigen Gruppen\n",
|
||
"- analysiere Zusammenhänge in Form von Graph (ähnlich bisherigem Vorgehen)\n",
|
||
"- bilde Gruppen und benenne diese (z.B. Prüfung+Überprüfung+Kontrolle --> Überprüfung)\n",
|
||
"- baue daraus Wörterbuch und matche Begriffe bei der Erstellung"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def build_cosine_similarity_matrix(\n",
|
||
" adj_mat\n",
|
||
"):\n",
|
||
" # obtain words to compare\n",
|
||
" words = adj_mat.index.to_list()\n",
|
||
" \n",
|
||
" # cos matrix\n",
|
||
" cos_mat = pd.DataFrame(\n",
|
||
" data=0., \n",
|
||
" columns=words, \n",
|
||
" index=words,\n",
|
||
" dtype=np.float32,\n",
|
||
" )\n",
|
||
" \n",
|
||
" for (word1, word2) in combinations(words, 2):\n",
|
||
" # obtain model vocabulary\n",
|
||
" w1 = nlp.vocab[str(word1)]\n",
|
||
" w2 = nlp.vocab[str(word2)]\n",
|
||
" # calculate cosine similarity\n",
|
||
" cos_sim = w1.similarity(w2)\n",
|
||
" # set value\n",
|
||
" cos_mat.at[word1, word2] = cos_sim\n",
|
||
" \n",
|
||
" return cos_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"C:\\Users\\foersterflorian\\AppData\\Local\\Temp\\ipykernel_17216\\213623562.py:20: UserWarning: [W008] Evaluating Lexeme.similarity based on empty vectors.\n",
|
||
" cos_sim = w1.similarity(w2)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"cos_mat = build_cosine_similarity_matrix(adj_mat=adj_mat_undir)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Klübertemp</th>\n",
|
||
" <th>Schusssuche</th>\n",
|
||
" <th>Laser</th>\n",
|
||
" <th>Schaftteile</th>\n",
|
||
" <th>Dichtsätz</th>\n",
|
||
" <th>Tastatur</th>\n",
|
||
" <th>Vorspuleinheit</th>\n",
|
||
" <th>beginnen</th>\n",
|
||
" <th>auslesen</th>\n",
|
||
" <th>Kettspannung</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Tänzerwalze</th>\n",
|
||
" <th>Abfallkante</th>\n",
|
||
" <th>rappeln</th>\n",
|
||
" <th>Rottenegger</th>\n",
|
||
" <th>Contrawalze</th>\n",
|
||
" <th>Eisenträger</th>\n",
|
||
" <th>Hängegurte</th>\n",
|
||
" <th>Treffen</th>\n",
|
||
" <th>Greiferarmen</th>\n",
|
||
" <th>Nadelleist</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Klübertemp</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Schusssuche</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Laser</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.324276</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.059743</td>\n",
|
||
" <td>0.133676</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>-0.063913</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.167521</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>-0.029860</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Schaftteile</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Dichtsätz</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Eisenträger</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.170954</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Hängegurte</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Treffen</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Greiferarmen</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Nadelleist</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6951 rows × 6951 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Klübertemp Schusssuche Laser Schaftteile Dichtsätz \\\n",
|
||
"Klübertemp 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Schusssuche 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Laser 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Schaftteile 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Dichtsätz 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Hängegurte 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Treffen 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Greiferarmen 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Nadelleist 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"\n",
|
||
" Tastatur Vorspuleinheit beginnen auslesen Kettspannung ... \\\n",
|
||
"Klübertemp 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Schusssuche 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Laser 0.324276 0.0 0.059743 0.133676 0.0 ... \n",
|
||
"Schaftteile 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Dichtsätz 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"Eisenträger 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Hängegurte 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Treffen 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Greiferarmen 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"Nadelleist 0.000000 0.0 0.000000 0.000000 0.0 ... \n",
|
||
"\n",
|
||
" Tänzerwalze Abfallkante rappeln Rottenegger Contrawalze \\\n",
|
||
"Klübertemp 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Schusssuche 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Laser 0.0 0.0 -0.063913 0.0 0.0 \n",
|
||
"Schaftteile 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Dichtsätz 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Hängegurte 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Treffen 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Greiferarmen 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"Nadelleist 0.0 0.0 0.000000 0.0 0.0 \n",
|
||
"\n",
|
||
" Eisenträger Hängegurte Treffen Greiferarmen Nadelleist \n",
|
||
"Klübertemp 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Schusssuche 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Laser 0.167521 0.0 -0.029860 0.0 0.0 \n",
|
||
"Schaftteile 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Dichtsätz 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0.000000 0.0 0.170954 0.0 0.0 \n",
|
||
"Hängegurte 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Treffen 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Greiferarmen 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"Nadelleist 0.000000 0.0 0.000000 0.0 0.0 \n",
|
||
"\n",
|
||
"[6951 rows x 6951 columns]"
|
||
]
|
||
},
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"cos_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 635,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"WEIGHT_THRESHOLD = 10\n",
|
||
"arr = adj_mat_undir.to_numpy()\n",
|
||
"COS_THRESHOLD = 0.4\n",
|
||
"cos_arr = cos_mat.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 636,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"cos_arr_filt = np.where((cos_arr > COS_THRESHOLD) & (arr >= WEIGHT_THRESHOLD), cos_arr, 0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 637,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[0., 0., 0., ..., 0., 0., 0.],\n",
|
||
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
||
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
||
" ...,\n",
|
||
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
||
" [0., 0., 0., ..., 0., 0., 0.],\n",
|
||
" [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)"
|
||
]
|
||
},
|
||
"execution_count": 637,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"cos_arr_filt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 638,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"217"
|
||
]
|
||
},
|
||
"execution_count": 638,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(cos_arr_filt)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 639,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"thresh_cos_mat = cos_mat.copy()\n",
|
||
"thresh_cos_mat[:] = cos_arr_filt"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 640,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Verstärkung</th>\n",
|
||
" <th>Zuluftfilter</th>\n",
|
||
" <th>klemmt</th>\n",
|
||
" <th>Komminikation</th>\n",
|
||
" <th>Doppelholztische</th>\n",
|
||
" <th>Deckenbeleuchtung</th>\n",
|
||
" <th>Abfalltransport</th>\n",
|
||
" <th>fahrbar</th>\n",
|
||
" <th>Folieneinlauf</th>\n",
|
||
" <th>entsorgen</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>neuwertig</th>\n",
|
||
" <th>Bleit</th>\n",
|
||
" <th>Rauchentwicklung</th>\n",
|
||
" <th>Kompressorsteuerung</th>\n",
|
||
" <th>anziehen</th>\n",
|
||
" <th>Mitarbeiterin</th>\n",
|
||
" <th>Nägel</th>\n",
|
||
" <th>WZ</th>\n",
|
||
" <th>ExSchutzAnlage</th>\n",
|
||
" <th>Gemisch</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Verstärkung</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Zuluftfilter</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>klemmt</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Komminikation</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Doppelholztische</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Mitarbeiterin</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Nägel</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>WZ</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ExSchutzAnlage</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Gemisch</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6951 rows × 6951 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Verstärkung Zuluftfilter klemmt Komminikation \\\n",
|
||
"Verstärkung 0.0 0.0 0.0 0.0 \n",
|
||
"Zuluftfilter 0.0 0.0 0.0 0.0 \n",
|
||
"klemmt 0.0 0.0 0.0 0.0 \n",
|
||
"Komminikation 0.0 0.0 0.0 0.0 \n",
|
||
"Doppelholztische 0.0 0.0 0.0 0.0 \n",
|
||
"... ... ... ... ... \n",
|
||
"Mitarbeiterin 0.0 0.0 0.0 0.0 \n",
|
||
"Nägel 0.0 0.0 0.0 0.0 \n",
|
||
"WZ 0.0 0.0 0.0 0.0 \n",
|
||
"ExSchutzAnlage 0.0 0.0 0.0 0.0 \n",
|
||
"Gemisch 0.0 0.0 0.0 0.0 \n",
|
||
"\n",
|
||
" Doppelholztische Deckenbeleuchtung Abfalltransport \\\n",
|
||
"Verstärkung 0.0 0.0 0.0 \n",
|
||
"Zuluftfilter 0.0 0.0 0.0 \n",
|
||
"klemmt 0.0 0.0 0.0 \n",
|
||
"Komminikation 0.0 0.0 0.0 \n",
|
||
"Doppelholztische 0.0 0.0 0.0 \n",
|
||
"... ... ... ... \n",
|
||
"Mitarbeiterin 0.0 0.0 0.0 \n",
|
||
"Nägel 0.0 0.0 0.0 \n",
|
||
"WZ 0.0 0.0 0.0 \n",
|
||
"ExSchutzAnlage 0.0 0.0 0.0 \n",
|
||
"Gemisch 0.0 0.0 0.0 \n",
|
||
"\n",
|
||
" fahrbar Folieneinlauf entsorgen ... neuwertig Bleit \\\n",
|
||
"Verstärkung 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"Zuluftfilter 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"klemmt 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"Komminikation 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"Doppelholztische 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"Mitarbeiterin 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"Nägel 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"WZ 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"ExSchutzAnlage 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"Gemisch 0.0 0.0 0.0 ... 0.0 0.0 \n",
|
||
"\n",
|
||
" Rauchentwicklung Kompressorsteuerung anziehen \\\n",
|
||
"Verstärkung 0.0 0.0 0.0 \n",
|
||
"Zuluftfilter 0.0 0.0 0.0 \n",
|
||
"klemmt 0.0 0.0 0.0 \n",
|
||
"Komminikation 0.0 0.0 0.0 \n",
|
||
"Doppelholztische 0.0 0.0 0.0 \n",
|
||
"... ... ... ... \n",
|
||
"Mitarbeiterin 0.0 0.0 0.0 \n",
|
||
"Nägel 0.0 0.0 0.0 \n",
|
||
"WZ 0.0 0.0 0.0 \n",
|
||
"ExSchutzAnlage 0.0 0.0 0.0 \n",
|
||
"Gemisch 0.0 0.0 0.0 \n",
|
||
"\n",
|
||
" Mitarbeiterin Nägel WZ ExSchutzAnlage Gemisch \n",
|
||
"Verstärkung 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Zuluftfilter 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"klemmt 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Komminikation 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Doppelholztische 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Mitarbeiterin 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Nägel 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"WZ 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"ExSchutzAnlage 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"Gemisch 0.0 0.0 0.0 0.0 0.0 \n",
|
||
"\n",
|
||
"[6951 rows x 6951 columns]"
|
||
]
|
||
},
|
||
"execution_count": 640,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"thresh_cos_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 641,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"COS_MAT_PATH_CSV = f'./Graphanalyse_Gruppen/cos_mat_Wthresh_{WEIGHT_THRESHOLD}_Cthresh{int(COS_THRESHOLD*100)}.csv'\n",
|
||
"thresh_cos_mat.to_csv(path_or_buf=COS_MAT_PATH_CSV, encoding='cp1252', sep=';')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 603,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = adj_mat_undir.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 604,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"24725"
|
||
]
|
||
},
|
||
"execution_count": 604,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 605,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"92788"
|
||
]
|
||
},
|
||
"execution_count": 605,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.max(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 606,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"257"
|
||
]
|
||
},
|
||
"execution_count": 606,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"uni_arr = np.unique(arr)\n",
|
||
"len(uni_arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Threshold"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 277,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"WEIGHT_THRESHOLD = 50\n",
|
||
"arr = adj_mat_undir.to_numpy()\n",
|
||
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 278,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"600"
|
||
]
|
||
},
|
||
"execution_count": 278,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 279,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"216"
|
||
]
|
||
},
|
||
"execution_count": 279,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp = np.sum(arr, axis=0)\n",
|
||
"np.count_nonzero(temp)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 280,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"thresh_adj_mat = adj_mat_undir.copy()\n",
|
||
"thresh_adj_mat.loc[:] = arr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 281,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Klübertemp</th>\n",
|
||
" <th>Schusssuche</th>\n",
|
||
" <th>Laser</th>\n",
|
||
" <th>Schaftteile</th>\n",
|
||
" <th>Dichtsätz</th>\n",
|
||
" <th>Tastatur</th>\n",
|
||
" <th>Vorspuleinheit</th>\n",
|
||
" <th>beginnen</th>\n",
|
||
" <th>auslesen</th>\n",
|
||
" <th>Kettspannung</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Tänzerwalze</th>\n",
|
||
" <th>Abfallkante</th>\n",
|
||
" <th>rappeln</th>\n",
|
||
" <th>Rottenegger</th>\n",
|
||
" <th>Contrawalze</th>\n",
|
||
" <th>Eisenträger</th>\n",
|
||
" <th>Hängegurte</th>\n",
|
||
" <th>Treffen</th>\n",
|
||
" <th>Greiferarmen</th>\n",
|
||
" <th>Nadelleist</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Klübertemp</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Schusssuche</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Laser</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Schaftteile</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Dichtsätz</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Eisenträger</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Hängegurte</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Treffen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Greiferarmen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Nadelleist</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6845 rows × 6845 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Klübertemp Schusssuche Laser Schaftteile Dichtsätz \\\n",
|
||
"Klübertemp 0 0 0 0 0 \n",
|
||
"Schusssuche 0 0 0 0 0 \n",
|
||
"Laser 0 0 0 0 0 \n",
|
||
"Schaftteile 0 0 0 0 0 \n",
|
||
"Dichtsätz 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0 0 0 0 0 \n",
|
||
"Hängegurte 0 0 0 0 0 \n",
|
||
"Treffen 0 0 0 0 0 \n",
|
||
"Greiferarmen 0 0 0 0 0 \n",
|
||
"Nadelleist 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Tastatur Vorspuleinheit beginnen auslesen Kettspannung ... \\\n",
|
||
"Klübertemp 0 0 0 0 0 ... \n",
|
||
"Schusssuche 0 0 0 0 0 ... \n",
|
||
"Laser 0 0 0 0 0 ... \n",
|
||
"Schaftteile 0 0 0 0 0 ... \n",
|
||
"Dichtsätz 0 0 0 0 0 ... \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"Eisenträger 0 0 0 0 0 ... \n",
|
||
"Hängegurte 0 0 0 0 0 ... \n",
|
||
"Treffen 0 0 0 0 0 ... \n",
|
||
"Greiferarmen 0 0 0 0 0 ... \n",
|
||
"Nadelleist 0 0 0 0 0 ... \n",
|
||
"\n",
|
||
" Tänzerwalze Abfallkante rappeln Rottenegger Contrawalze \\\n",
|
||
"Klübertemp 0 0 0 0 0 \n",
|
||
"Schusssuche 0 0 0 0 0 \n",
|
||
"Laser 0 0 0 0 0 \n",
|
||
"Schaftteile 0 0 0 0 0 \n",
|
||
"Dichtsätz 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0 0 0 0 0 \n",
|
||
"Hängegurte 0 0 0 0 0 \n",
|
||
"Treffen 0 0 0 0 0 \n",
|
||
"Greiferarmen 0 0 0 0 0 \n",
|
||
"Nadelleist 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Eisenträger Hängegurte Treffen Greiferarmen Nadelleist \n",
|
||
"Klübertemp 0 0 0 0 0 \n",
|
||
"Schusssuche 0 0 0 0 0 \n",
|
||
"Laser 0 0 0 0 0 \n",
|
||
"Schaftteile 0 0 0 0 0 \n",
|
||
"Dichtsätz 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Eisenträger 0 0 0 0 0 \n",
|
||
"Hängegurte 0 0 0 0 0 \n",
|
||
"Treffen 0 0 0 0 0 \n",
|
||
"Greiferarmen 0 0 0 0 0 \n",
|
||
"Nadelleist 0 0 0 0 0 \n",
|
||
"\n",
|
||
"[6845 rows x 6845 columns]"
|
||
]
|
||
},
|
||
"execution_count": 281,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"thresh_adj_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 282,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_MAT_PATH_CSV = f'./Graphanalyse_Gruppen/adj_mat_thresh_mapping_{WEIGHT_THRESHOLD}.csv'\n",
|
||
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"***Testing***"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 208,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"important_words = []\n",
|
||
"all_entities = []\n",
|
||
"pos_tags = set()\n",
|
||
"pos_counter = dict()\n",
|
||
"token_counter = 0\n",
|
||
"\n",
|
||
"for description in descr:\n",
|
||
" doc = nlp(description)\n",
|
||
" \n",
|
||
" relevant_words = []\n",
|
||
" for token in doc:\n",
|
||
" POS = token.pos_\n",
|
||
" token_counter += 1\n",
|
||
" if POS in pos_counter:\n",
|
||
" pos_counter[POS] += 1\n",
|
||
" else:\n",
|
||
" pos_counter[POS] = 1\n",
|
||
" \n",
|
||
" if (not token.is_stop and not token.is_punct and \n",
|
||
" not token.is_space and (POS == 'NOUN' or \n",
|
||
" POS == 'PROPN' or \n",
|
||
" POS == 'ADJ' or \n",
|
||
" POS == 'ADV')):\n",
|
||
" relevant_words.append((token.lemma_.lower(), POS))\n",
|
||
" #pos_tags.add(token.pos_)\n",
|
||
" \n",
|
||
" entities = [] \n",
|
||
" for ent in doc.ents:\n",
|
||
" entities.append((ent.text, ent.label_))\n",
|
||
" \n",
|
||
" important_words.extend(relevant_words)\n",
|
||
" all_entities.extend(entities)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 209,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[('descr', 'ADV'), ('num_occur', 'NOUN')]"
|
||
]
|
||
},
|
||
"execution_count": 209,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"important_words"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 210,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"2"
|
||
]
|
||
},
|
||
"execution_count": 210,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(important_words)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 211,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[('descr', 'LOC'), ('num_occur', 'MISC')]"
|
||
]
|
||
},
|
||
"execution_count": 211,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"all_entities"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"count = Counter(important_words)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Counter({('täglich', 'ADJ'): 3,\n",
|
||
" ('prüfung', 'NOUN'): 3,\n",
|
||
" ('sichtkontrolle', 'NOUN'): 2,\n",
|
||
" ('kontrolle', 'NOUN'): 2,\n",
|
||
" ('scharniere', 'NOUN'): 2,\n",
|
||
" ('dichtung', 'NOUN'): 2,\n",
|
||
" ('schließvorrichtung', 'NOUN'): 2,\n",
|
||
" ('schloß', 'NOUN'): 2,\n",
|
||
" ('beschlag', 'NOUN'): 2,\n",
|
||
" ('allgemein', 'ADJ'): 2,\n",
|
||
" ('funktion', 'NOUN'): 2,\n",
|
||
" ('schmierung', 'NOUN'): 2,\n",
|
||
" ('festhaltevorrichtung', 'NOUN'): 2,\n",
|
||
" ('monatliche', 'ADJ'): 2,\n",
|
||
" ('wartungstätigkeit', 'NOUN'): 1,\n",
|
||
" ('vorgabe', 'NOUN'): 1,\n",
|
||
" ('maschinenhersteller', 'NOUN'): 1,\n",
|
||
" ('wöchentliche', 'ADJ'): 1,\n",
|
||
" ('reinigung', 'NOUN'): 1,\n",
|
||
" ('überprüfung', 'NOUN'): 1,\n",
|
||
" ('ölabscheider', 'NOUN'): 1,\n",
|
||
" ('wöchentlich', 'ADJ'): 1,\n",
|
||
" ('wc-anlage', 'NOUN'): 1,\n",
|
||
" ('halbjährliche', 'ADJ'): 1,\n",
|
||
" ('stabbreithalter', 'NOUN'): 1,\n",
|
||
" ('brandschutztechnische', 'ADJ'): 1,\n",
|
||
" ('technikrundgang', 'NOUN'): 1})"
|
||
]
|
||
},
|
||
"execution_count": 225,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"count"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"NOUN 25722\n",
|
||
"PUNCT 11626\n",
|
||
"VERB 9093\n",
|
||
"ADP 7211\n",
|
||
"ADV 6526\n",
|
||
"PROPN 4481\n",
|
||
"NUM 4115\n",
|
||
"DET 3845\n",
|
||
"ADJ 2576\n",
|
||
"AUX 2329\n",
|
||
"PART 1561\n",
|
||
"CCONJ 1305\n",
|
||
"X 999\n",
|
||
"PRON 916\n",
|
||
"SCONJ 385\n",
|
||
"SPACE 236\n",
|
||
"INTJ 1\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 180,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pos_count = pd.Series(data=pos_counter)\n",
|
||
"pos_count.sort_values(ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"NOUN 0.310176\n",
|
||
"PUNCT 0.140196\n",
|
||
"VERB 0.109651\n",
|
||
"ADP 0.086956\n",
|
||
"ADV 0.078696\n",
|
||
"PROPN 0.054035\n",
|
||
"NUM 0.049622\n",
|
||
"DET 0.046366\n",
|
||
"ADJ 0.031063\n",
|
||
"AUX 0.028085\n",
|
||
"PART 0.018824\n",
|
||
"CCONJ 0.015737\n",
|
||
"X 0.012047\n",
|
||
"PRON 0.011046\n",
|
||
"SCONJ 0.004643\n",
|
||
"SPACE 0.002846\n",
|
||
"INTJ 0.000012\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 184,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pos_count_rel = pos_count / pos_count.sum()\n",
|
||
"pos_count_rel.sort_values(ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"82927"
|
||
]
|
||
},
|
||
"execution_count": 181,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"token_counter"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Weiterführende Analyse der Beschreibungen\n",
|
||
"\n",
|
||
"- unklare Zusammenhänge der 1200er-Threshold-Ergebnisse präzisieren:\n",
|
||
" - Finden der entsprechenden Beschreibungen\n",
|
||
" - Kontextualisieren\n",
|
||
"- Identifikation von weiteren Blacklistworten"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Unklare Zusammenhänge 1200er-Threshold"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>index</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>161</th>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>92592</td>\n",
|
||
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
|
||
" <td>206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>33</th>\n",
|
||
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1654</td>\n",
|
||
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
|
||
" <td>18</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>130</th>\n",
|
||
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1616</td>\n",
|
||
" <td>[0, 970, 2134, 2137]</td>\n",
|
||
" <td>4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>159</th>\n",
|
||
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>1265</td>\n",
|
||
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
|
||
" <td>11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>139</th>\n",
|
||
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>687</td>\n",
|
||
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
|
||
" <td>166</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2675</th>\n",
|
||
" <td>Stand 15.07.2020 Stöppel: Herr Langner Toyota ...</td>\n",
|
||
" <td>253</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[311]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2674</th>\n",
|
||
" <td>Zahnräder der Laufkatze verschlissen Ersatztei...</td>\n",
|
||
" <td>167</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[415]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2673</th>\n",
|
||
" <td>Bitte 8 Scheiben nach Muster anfertigen. Danke.</td>\n",
|
||
" <td>47</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[140]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2672</th>\n",
|
||
" <td>Schalter für Bühne Schwenken abgerissen, bitte...</td>\n",
|
||
" <td>123</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[323]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6781</th>\n",
|
||
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
|
||
" <td>99</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[326]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6782 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"index \n",
|
||
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
|
||
"33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n",
|
||
"130 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
|
||
"159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
|
||
"139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
|
||
"... ... ... ... \n",
|
||
"2675 Stand 15.07.2020 Stöppel: Herr Langner Toyota ... 253 1 \n",
|
||
"2674 Zahnräder der Laufkatze verschlissen Ersatztei... 167 1 \n",
|
||
"2673 Bitte 8 Scheiben nach Muster anfertigen. Danke. 47 1 \n",
|
||
"2672 Schalter für Bühne Schwenken abgerissen, bitte... 123 1 \n",
|
||
"6781 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"index \n",
|
||
"161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
|
||
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
|
||
"130 [0, 970, 2134, 2137] 4 \n",
|
||
"159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
|
||
"139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
|
||
"... ... ... \n",
|
||
"2675 [311] 1 \n",
|
||
"2674 [415] 1 \n",
|
||
"2673 [140] 1 \n",
|
||
"2672 [323] 1 \n",
|
||
"6781 [326] 1 \n",
|
||
"\n",
|
||
"[6782 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"temp2 = temp1.loc[temp1['num_occur'] >= 3, :]\n",
|
||
"temp2 = temp1.copy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#temp2 = temp2.iloc[:30,:]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"check_words = set(['E1.8'])\n",
|
||
"target_indices = list()\n",
|
||
"\n",
|
||
"for idx, row in temp2.iterrows():\n",
|
||
" \n",
|
||
" text = row['descr']\n",
|
||
" doc = nlp(text)\n",
|
||
" \n",
|
||
" token_set = set()\n",
|
||
" target_idx = None\n",
|
||
" for token in doc:\n",
|
||
" \n",
|
||
" if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n",
|
||
" continue\n",
|
||
" \n",
|
||
" token_set.add(token.lemma_.lower())\n",
|
||
" #print(f'{token_set=}')\n",
|
||
"\n",
|
||
" if token_set.issuperset(check_words):\n",
|
||
" target_indices.append(idx)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[]"
|
||
]
|
||
},
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"target_indices"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Vorgaben aus Pleva Wartungsplan Schmieren der Rollenlager der beiden Kameralaufschlitten des Strukturdetektors SD 1C siehe Extradaten'"
|
||
]
|
||
},
|
||
"execution_count": 506,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"idx = target_indices[3]\n",
|
||
"temp2.at[idx, 'descr']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Leiterprüfung derzeit in Arbeit Abteilungsleiter sind per Email am 11.06.2019 über deren Eigenverantwortlichkeit und Mithilfe durch Herr Graf informiert worden.'"
|
||
]
|
||
},
|
||
"execution_count": 229,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp2.at[1921,'descr']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"True"
|
||
]
|
||
},
|
||
"execution_count": 197,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"token_set.issuperset(check_words)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'ADJD'}"
|
||
]
|
||
},
|
||
"execution_count": 180,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"POS_of_interest\n",
|
||
"TAG_of_interest"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"test = 'Tägliche, tägliche Wartungstätigkeit des Maschinenherstellers Maschine'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"doc = nlp(test)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"täglich\n",
|
||
"--\n",
|
||
"täglich\n",
|
||
"wartungstätigkeit\n",
|
||
"der\n",
|
||
"maschinenhersteller\n",
|
||
"maschine\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for token in doc:\n",
|
||
" print(token.lemma_.lower())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"replace_chars = [',', '\\n', '\\t', '\\s']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"test = test.lower()\n",
|
||
"for char in replace_chars:\n",
|
||
" test = test.replace(char, '')\n",
|
||
"test = test.split()\n",
|
||
"test = set(test)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'des', 'maschine', 'maschinenherstellers', 'tägliche', 'wartungstätigkeit'}"
|
||
]
|
||
},
|
||
"execution_count": 112,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"test"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"False"
|
||
]
|
||
},
|
||
"execution_count": 104,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"test.issuperset(check_words)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Zwischenergebnisse:**\n",
|
||
"\n",
|
||
"*bestimmte ObjektIDs haben den Escape-Charakter, andere nicht: keine ObjektID mit beiden Varianten*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl der Duplikate = 47689 für Beschreibung mit Index-Nr. 171:\n",
|
||
" Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(f\"Anzahl der Duplikate = {max_val} für Beschreibung mit Index-Nr. {index}:\\n {text}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"---\n",
|
||
"\n",
|
||
"# Merkmal 2: VorgangsArtText"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"feature = 'VorgangsArtText'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"base = wo_duplicates.copy()\n",
|
||
"base = base.dropna(axis=0, subset=feature)\n",
|
||
"base[feature] = base[feature].map(clean_string)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>11</td>\n",
|
||
" <td>114</td>\n",
|
||
" <td>427 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kettbaum kaputt</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>17</td>\n",
|
||
" <td>124</td>\n",
|
||
" <td>621 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>asgasdg</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Elektrowerkstatt</td>\n",
|
||
" <td>Elektrowerkstatt</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-11</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>53</td>\n",
|
||
" <td>244</td>\n",
|
||
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Greifer-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Kupplung schleift</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kupplung defekt</td>\n",
|
||
" <td>2019-03-20</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>NaT</td>\n",
|
||
" <td>2019-03-19</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>58</td>\n",
|
||
" <td>257</td>\n",
|
||
" <td>107, Webmaschine, OM 220 EOS</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Gegengewicht wieder anbringen</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>81</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>00138, Schärmaschine 9,</td>\n",
|
||
" <td>16</td>\n",
|
||
" <td>Schärmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText \\\n",
|
||
"0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
|
||
"3 58 257 107, Webmaschine, OM 220 EOS \n",
|
||
"4 81 138 00138, Schärmaschine 9, \n",
|
||
"\n",
|
||
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
|
||
"0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"\n",
|
||
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"0 2019-03-06 4 0 \n",
|
||
"1 2019-03-11 5 0 \n",
|
||
"2 2019-03-19 5 0 \n",
|
||
"3 2019-03-21 5 0 \n",
|
||
"4 2019-03-25 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"0 NaN NaN \n",
|
||
"1 NaN NaN \n",
|
||
"2 Kupplung schleift NaN \n",
|
||
"3 Gegengewicht wieder anbringen NaN \n",
|
||
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"0 Kettbaum kaputt 2019-03-06 \n",
|
||
"1 asgasdg 2019-03-11 \n",
|
||
"2 Kupplung defekt 2019-03-20 \n",
|
||
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
|
||
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
|
||
"\n",
|
||
" ErledigungsArtText ErledigungsBeschreibung \\\n",
|
||
"0 NaN NaN \n",
|
||
"1 NaN NaN \n",
|
||
"2 Reparatur UTT NaN \n",
|
||
"3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
|
||
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
|
||
"\n",
|
||
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"0 Weberei Weberei NaT 2019-03-06 \n",
|
||
"1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n",
|
||
"2 Weberei Weberei NaT 2019-03-19 \n",
|
||
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
|
||
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 "
|
||
]
|
||
},
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"base.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Einträge: 128936\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"descriptions = base[feature]\n",
|
||
"print(f\"Einträge: {len(descriptions)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Duplikate VorgangsArtText: 128545\n",
|
||
"Anzahl einzigartiger VorgangsArtText: 391\n",
|
||
"Anteil einzigartiger VorgangsArtText: 0.30 %\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"num_dupl_descr = descriptions.duplicated().sum()\n",
|
||
"uni_descr = descriptions.unique()\n",
|
||
"num_uni_descr = len(uni_descr)\n",
|
||
"\n",
|
||
"print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n",
|
||
"print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n",
|
||
"print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"if not LOAD_CALC_FILES:\n",
|
||
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
|
||
" descr_df = pd.DataFrame(columns=cols)\n",
|
||
" max_val = 0\n",
|
||
" text = None\n",
|
||
" index = 0\n",
|
||
"\n",
|
||
"\n",
|
||
" for idx, description in enumerate(uni_descr):\n",
|
||
" len_descr = len(description)\n",
|
||
" filt = base[feature] == description\n",
|
||
" temp = base[filt]\n",
|
||
" assoc_obj_ids = temp['ObjektID'].unique()\n",
|
||
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
|
||
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
|
||
" num_dupl = filt.sum()\n",
|
||
" \n",
|
||
" conc_df = pd.DataFrame(data=[[\n",
|
||
" description,\n",
|
||
" len_descr,\n",
|
||
" num_dupl,\n",
|
||
" assoc_obj_ids,\n",
|
||
" num_assoc_obj_ids\n",
|
||
" ]], columns=cols)\n",
|
||
" \n",
|
||
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
|
||
" \n",
|
||
" if num_dupl > max_val:\n",
|
||
" max_val = num_dupl\n",
|
||
" index = idx\n",
|
||
" text = description\n",
|
||
" \n",
|
||
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>60</th>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>92719</td>\n",
|
||
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
|
||
" <td>206</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>01 Interne Reinigung Pflege Überprüfung</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>11250</td>\n",
|
||
" <td>[0, 7, 425, 426, 427, 428, 429, 517, 518, 576,...</td>\n",
|
||
" <td>349</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>28</th>\n",
|
||
" <td>02 Interne Reinigung Pflege Überprüfung</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>3263</td>\n",
|
||
" <td>[576, 906, 910, 940, 941, 942, 943, 1040, 1041...</td>\n",
|
||
" <td>52</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>29</th>\n",
|
||
" <td>Maschinen-Wartung wöchentlich</td>\n",
|
||
" <td>29</td>\n",
|
||
" <td>2408</td>\n",
|
||
" <td>[1, 301, 305, 313, 314, 331, 332, 510, 511, 51...</td>\n",
|
||
" <td>25</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>46</th>\n",
|
||
" <td>Gesetzliche Wartung Prüfung jährlich</td>\n",
|
||
" <td>36</td>\n",
|
||
" <td>2403</td>\n",
|
||
" <td>[0, 191, 193, 195, 197, 200, 287, 288, 289, 29...</td>\n",
|
||
" <td>638</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>222</th>\n",
|
||
" <td>Walze WK 03 Umlenkwalze zapfen</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[1]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>224</th>\n",
|
||
" <td>Leiter Nr. 90 und überprüfen</td>\n",
|
||
" <td>28</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[1]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>225</th>\n",
|
||
" <td>Locht nicht mehr</td>\n",
|
||
" <td>16</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[338]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>226</th>\n",
|
||
" <td>Maschine stellt immer wieder ab</td>\n",
|
||
" <td>31</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[338]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>390</th>\n",
|
||
" <td>Gesetzliche Wartung Prüfung Anlagenprüfung Dru...</td>\n",
|
||
" <td>56</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[547]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>391 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"60 Tägliche Interne Wartungstätigkeiten Weberei 44 92719 \n",
|
||
"10 01 Interne Reinigung Pflege Überprüfung 39 11250 \n",
|
||
"28 02 Interne Reinigung Pflege Überprüfung 39 3263 \n",
|
||
"29 Maschinen-Wartung wöchentlich 29 2408 \n",
|
||
"46 Gesetzliche Wartung Prüfung jährlich 36 2403 \n",
|
||
".. ... .. ... \n",
|
||
"222 Walze WK 03 Umlenkwalze zapfen 30 1 \n",
|
||
"224 Leiter Nr. 90 und überprüfen 28 1 \n",
|
||
"225 Locht nicht mehr 16 1 \n",
|
||
"226 Maschine stellt immer wieder ab 31 1 \n",
|
||
"390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 56 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"60 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
|
||
"10 [0, 7, 425, 426, 427, 428, 429, 517, 518, 576,... 349 \n",
|
||
"28 [576, 906, 910, 940, 941, 942, 943, 1040, 1041... 52 \n",
|
||
"29 [1, 301, 305, 313, 314, 331, 332, 510, 511, 51... 25 \n",
|
||
"46 [0, 191, 193, 195, 197, 200, 287, 288, 289, 29... 638 \n",
|
||
".. ... ... \n",
|
||
"222 [1] 1 \n",
|
||
"224 [1] 1 \n",
|
||
"225 [338] 1 \n",
|
||
"226 [338] 1 \n",
|
||
"390 [547] 1 \n",
|
||
"\n",
|
||
"[391 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# save/load dataframe\n",
|
||
"FILE_PATH = f'{feature}_analyse_1.fth'\n",
|
||
"if LOAD_CALC_FILES:\n",
|
||
" temp1 = pd.read_feather(FILE_PATH)\n",
|
||
" temp1 = temp1.set_index('index')\n",
|
||
"else:\n",
|
||
" save_df = temp1.reset_index()\n",
|
||
" save_df.to_feather(FILE_PATH)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Gesamter Datensatz"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# analysiere erste 10 Einträge\n",
|
||
"descr = temp1[['descr', 'num_occur']]\n",
|
||
"#descr = descr.iloc[50:200,:]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 62,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"391"
|
||
]
|
||
},
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(descr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>60</th>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>92719</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>01 Interne Reinigung Pflege Überprüfung</td>\n",
|
||
" <td>11250</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>28</th>\n",
|
||
" <td>02 Interne Reinigung Pflege Überprüfung</td>\n",
|
||
" <td>3263</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>29</th>\n",
|
||
" <td>Maschinen-Wartung wöchentlich</td>\n",
|
||
" <td>2408</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>46</th>\n",
|
||
" <td>Gesetzliche Wartung Prüfung jährlich</td>\n",
|
||
" <td>2403</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>222</th>\n",
|
||
" <td>Walze WK 03 Umlenkwalze zapfen</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>224</th>\n",
|
||
" <td>Leiter Nr. 90 und überprüfen</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>225</th>\n",
|
||
" <td>Locht nicht mehr</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>226</th>\n",
|
||
" <td>Maschine stellt immer wieder ab</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>390</th>\n",
|
||
" <td>Gesetzliche Wartung Prüfung Anlagenprüfung Dru...</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>391 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr num_occur\n",
|
||
"60 Tägliche Interne Wartungstätigkeiten Weberei 92719\n",
|
||
"10 01 Interne Reinigung Pflege Überprüfung 11250\n",
|
||
"28 02 Interne Reinigung Pflege Überprüfung 3263\n",
|
||
"29 Maschinen-Wartung wöchentlich 2408\n",
|
||
"46 Gesetzliche Wartung Prüfung jährlich 2403\n",
|
||
".. ... ...\n",
|
||
"222 Walze WK 03 Umlenkwalze zapfen 1\n",
|
||
"224 Leiter Nr. 90 und überprüfen 1\n",
|
||
"225 Locht nicht mehr 1\n",
|
||
"226 Maschine stellt immer wieder ab 1\n",
|
||
"390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 1\n",
|
||
"\n",
|
||
"[391 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"descr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#LOAD_CALC_FILES = True\n",
|
||
"#LOAD_CALC_FILES = False\n",
|
||
"#IS_TEST = True\n",
|
||
"IS_TEST = False"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:base:Number of entries processed: 1, Percent completed: 0.26\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:base:Number of entries processed: 101, Percent completed: 25.83\n",
|
||
"INFO:base:Number of entries processed: 201, Percent completed: 51.41\n",
|
||
"INFO:base:Number of entries processed: 301, Percent completed: 76.98\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# adjacency matrix\n",
|
||
"connections = dict()\n",
|
||
"unique_tokens = set()\n",
|
||
"UPDATE_STATUS = 100\n",
|
||
"length_data = len(descr)\n",
|
||
"spell_check_candidates = set()\n",
|
||
"spell_checker = SpellChecker(language='de', distance=1)\n",
|
||
"\n",
|
||
"if not LOAD_CALC_FILES or IS_TEST:\n",
|
||
" for count, description in enumerate(descr.iterrows()):\n",
|
||
" \n",
|
||
" text = description[1]['descr']\n",
|
||
" weight = description[1]['num_occur']\n",
|
||
" \n",
|
||
" doc = nlp(text)\n",
|
||
" \n",
|
||
" obtain_descendant_info(\n",
|
||
" doc=doc,\n",
|
||
" weight=weight,\n",
|
||
" POS_of_interest=POS_of_interest,\n",
|
||
" TAG_of_interest=TAG_of_interest,\n",
|
||
" connections=connections,\n",
|
||
" unique_tokens=unique_tokens,\n",
|
||
" spell_check_candidates=spell_check_candidates,\n",
|
||
" spell_check_whitelist=spell_check_whitelist,\n",
|
||
" spell_checker=spell_checker,\n",
|
||
" corrections=corrections,\n",
|
||
" )\n",
|
||
" \n",
|
||
" if count % UPDATE_STATUS == 0:\n",
|
||
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n",
|
||
"if not IS_TEST:\n",
|
||
" if LOAD_CALC_FILES:\n",
|
||
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
|
||
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
|
||
" # additional information\n",
|
||
" connections = load_pickle('connections.pkl')\n",
|
||
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
|
||
" else:\n",
|
||
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
|
||
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
|
||
" save_df = adj_mat_undir.reset_index()\n",
|
||
" save_df.to_feather(ADJ_DF_PATH)\n",
|
||
" # additional information\n",
|
||
" save_pickle(obj=connections, path='connections.pkl')\n",
|
||
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>lecken</th>\n",
|
||
" <th>WC</th>\n",
|
||
" <th>LKW</th>\n",
|
||
" <th>offen</th>\n",
|
||
" <th>Maschinen-Reinigung</th>\n",
|
||
" <th>Dockenwickler</th>\n",
|
||
" <th>halb-jährlich</th>\n",
|
||
" <th>Tisch</th>\n",
|
||
" <th>zentral</th>\n",
|
||
" <th>anbringen</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>undicht-</th>\n",
|
||
" <th>Platine</th>\n",
|
||
" <th>erneuern</th>\n",
|
||
" <th>Verschmutzung</th>\n",
|
||
" <th>befestigen</th>\n",
|
||
" <th>wechseln</th>\n",
|
||
" <th>Labor</th>\n",
|
||
" <th>Walze</th>\n",
|
||
" <th>anfahren</th>\n",
|
||
" <th>Leiter</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>12-monatige-Inspektion</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2-monatlich</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2-wöchentlich</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24-monatige-Inspektion</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3-jährlich</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Ölwechsel</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Überprüfung</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>äußerer</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überprüfen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überziehen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>390 rows × 390 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" lecken WC LKW offen Maschinen-Reinigung \\\n",
|
||
"12-monatige-Inspektion 0 0 0 0 0 \n",
|
||
"2-monatlich 0 0 0 0 0 \n",
|
||
"2-wöchentlich 0 0 0 0 0 \n",
|
||
"24-monatige-Inspektion 0 0 0 0 0 \n",
|
||
"3-jährlich 0 0 0 0 0 \n",
|
||
"... ... .. ... ... ... \n",
|
||
"Ölwechsel 0 0 0 0 0 \n",
|
||
"Überprüfung 0 0 0 0 0 \n",
|
||
"äußerer 0 0 0 0 0 \n",
|
||
"überprüfen 0 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Dockenwickler halb-jährlich Tisch zentral \\\n",
|
||
"12-monatige-Inspektion 0 0 0 0 \n",
|
||
"2-monatlich 0 0 0 0 \n",
|
||
"2-wöchentlich 0 0 0 0 \n",
|
||
"24-monatige-Inspektion 0 0 0 0 \n",
|
||
"3-jährlich 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"Ölwechsel 0 0 0 0 \n",
|
||
"Überprüfung 0 0 0 0 \n",
|
||
"äußerer 0 0 0 0 \n",
|
||
"überprüfen 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 \n",
|
||
"\n",
|
||
" anbringen ... undicht- Platine erneuern \\\n",
|
||
"12-monatige-Inspektion 0 ... 0 0 0 \n",
|
||
"2-monatlich 0 ... 0 0 0 \n",
|
||
"2-wöchentlich 0 ... 0 0 0 \n",
|
||
"24-monatige-Inspektion 0 ... 0 0 0 \n",
|
||
"3-jährlich 0 ... 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Ölwechsel 0 ... 0 0 0 \n",
|
||
"Überprüfung 0 ... 0 0 0 \n",
|
||
"äußerer 0 ... 0 0 0 \n",
|
||
"überprüfen 0 ... 0 0 0 \n",
|
||
"überziehen 0 ... 0 0 0 \n",
|
||
"\n",
|
||
" Verschmutzung befestigen wechseln Labor Walze \\\n",
|
||
"12-monatige-Inspektion 0 0 0 0 0 \n",
|
||
"2-monatlich 0 0 0 0 0 \n",
|
||
"2-wöchentlich 0 0 0 0 0 \n",
|
||
"24-monatige-Inspektion 0 0 0 0 0 \n",
|
||
"3-jährlich 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... \n",
|
||
"Ölwechsel 0 0 0 0 0 \n",
|
||
"Überprüfung 0 0 0 0 0 \n",
|
||
"äußerer 0 0 0 0 0 \n",
|
||
"überprüfen 0 0 0 0 0 \n",
|
||
"überziehen 0 0 0 0 1 \n",
|
||
"\n",
|
||
" anfahren Leiter \n",
|
||
"12-monatige-Inspektion 0 0 \n",
|
||
"2-monatlich 0 0 \n",
|
||
"2-wöchentlich 0 0 \n",
|
||
"24-monatige-Inspektion 0 0 \n",
|
||
"3-jährlich 0 0 \n",
|
||
"... ... ... \n",
|
||
"Ölwechsel 0 0 \n",
|
||
"Überprüfung 0 0 \n",
|
||
"äußerer 0 0 \n",
|
||
"überprüfen 0 1 \n",
|
||
"überziehen 0 0 \n",
|
||
"\n",
|
||
"[390 rows x 390 columns]"
|
||
]
|
||
},
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"adj_mat_undir.sort_index()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = adj_mat_undir.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"391"
|
||
]
|
||
},
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"92964"
|
||
]
|
||
},
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.max(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Threshold"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 162,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"WEIGHT_THRESHOLD = 0"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 163,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = adj_mat_undir.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 164,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 165,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"391"
|
||
]
|
||
},
|
||
"execution_count": 165,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 166,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"233"
|
||
]
|
||
},
|
||
"execution_count": 166,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp = np.sum(arr, axis=0)\n",
|
||
"np.count_nonzero(temp)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 167,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"thresh_adj_mat = adj_mat_undir.copy()\n",
|
||
"thresh_adj_mat.loc[:] = arr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 168,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Wasserleitung</th>\n",
|
||
" <th>wechseln</th>\n",
|
||
" <th>Winkelpositionsgeber</th>\n",
|
||
" <th>Klimaanlagengerät</th>\n",
|
||
" <th>versetzen</th>\n",
|
||
" <th>Brennschlitten</th>\n",
|
||
" <th>feststellen</th>\n",
|
||
" <th>Stuhl</th>\n",
|
||
" <th>monatlich</th>\n",
|
||
" <th>anfertigen</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Zahnriemen</th>\n",
|
||
" <th>Rampe</th>\n",
|
||
" <th>Tisch</th>\n",
|
||
" <th>defekt</th>\n",
|
||
" <th>Elektrische</th>\n",
|
||
" <th>haben</th>\n",
|
||
" <th>Wasserenthärtungsanlage</th>\n",
|
||
" <th>Gestank</th>\n",
|
||
" <th>Zahnrad</th>\n",
|
||
" <th>hydraulisch</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Wasserleitung</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>wechseln</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Winkelpositionsgeber</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Klimaanlagengerät</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>versetzen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>haben</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Wasserenthärtungsanlage</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Gestank</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Zahnrad</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>hydraulisch</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>390 rows × 390 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Wasserleitung wechseln Winkelpositionsgeber \\\n",
|
||
"Wasserleitung 0 0 0 \n",
|
||
"wechseln 0 0 0 \n",
|
||
"Winkelpositionsgeber 0 0 0 \n",
|
||
"Klimaanlagengerät 0 0 0 \n",
|
||
"versetzen 0 0 0 \n",
|
||
"... ... ... ... \n",
|
||
"haben 0 0 0 \n",
|
||
"Wasserenthärtungsanlage 0 0 0 \n",
|
||
"Gestank 0 0 0 \n",
|
||
"Zahnrad 0 0 0 \n",
|
||
"hydraulisch 0 0 0 \n",
|
||
"\n",
|
||
" Klimaanlagengerät versetzen Brennschlitten \\\n",
|
||
"Wasserleitung 0 0 0 \n",
|
||
"wechseln 0 0 0 \n",
|
||
"Winkelpositionsgeber 0 0 0 \n",
|
||
"Klimaanlagengerät 0 0 0 \n",
|
||
"versetzen 0 0 0 \n",
|
||
"... ... ... ... \n",
|
||
"haben 0 0 0 \n",
|
||
"Wasserenthärtungsanlage 0 0 0 \n",
|
||
"Gestank 0 0 0 \n",
|
||
"Zahnrad 0 0 0 \n",
|
||
"hydraulisch 0 0 0 \n",
|
||
"\n",
|
||
" feststellen Stuhl monatlich anfertigen ... \\\n",
|
||
"Wasserleitung 0 0 0 0 ... \n",
|
||
"wechseln 0 0 0 0 ... \n",
|
||
"Winkelpositionsgeber 0 0 0 0 ... \n",
|
||
"Klimaanlagengerät 0 0 0 0 ... \n",
|
||
"versetzen 0 0 0 0 ... \n",
|
||
"... ... ... ... ... ... \n",
|
||
"haben 0 0 0 0 ... \n",
|
||
"Wasserenthärtungsanlage 0 0 0 0 ... \n",
|
||
"Gestank 0 0 0 0 ... \n",
|
||
"Zahnrad 0 0 0 0 ... \n",
|
||
"hydraulisch 0 0 0 0 ... \n",
|
||
"\n",
|
||
" Zahnriemen Rampe Tisch defekt Elektrische haben \\\n",
|
||
"Wasserleitung 0 0 0 0 0 0 \n",
|
||
"wechseln 0 0 0 0 0 0 \n",
|
||
"Winkelpositionsgeber 0 0 0 1 0 0 \n",
|
||
"Klimaanlagengerät 0 0 0 0 0 0 \n",
|
||
"versetzen 0 0 0 0 0 0 \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"haben 0 0 0 0 0 0 \n",
|
||
"Wasserenthärtungsanlage 0 0 0 0 0 0 \n",
|
||
"Gestank 0 0 0 0 0 0 \n",
|
||
"Zahnrad 0 0 0 0 0 0 \n",
|
||
"hydraulisch 0 0 0 0 0 0 \n",
|
||
"\n",
|
||
" Wasserenthärtungsanlage Gestank Zahnrad \\\n",
|
||
"Wasserleitung 0 0 0 \n",
|
||
"wechseln 0 0 0 \n",
|
||
"Winkelpositionsgeber 0 0 0 \n",
|
||
"Klimaanlagengerät 0 0 0 \n",
|
||
"versetzen 0 0 0 \n",
|
||
"... ... ... ... \n",
|
||
"haben 0 0 0 \n",
|
||
"Wasserenthärtungsanlage 0 0 0 \n",
|
||
"Gestank 0 0 0 \n",
|
||
"Zahnrad 0 0 0 \n",
|
||
"hydraulisch 0 0 0 \n",
|
||
"\n",
|
||
" hydraulisch \n",
|
||
"Wasserleitung 0 \n",
|
||
"wechseln 0 \n",
|
||
"Winkelpositionsgeber 0 \n",
|
||
"Klimaanlagengerät 0 \n",
|
||
"versetzen 0 \n",
|
||
"... ... \n",
|
||
"haben 0 \n",
|
||
"Wasserenthärtungsanlage 0 \n",
|
||
"Gestank 0 \n",
|
||
"Zahnrad 0 \n",
|
||
"hydraulisch 0 \n",
|
||
"\n",
|
||
"[390 rows x 390 columns]"
|
||
]
|
||
},
|
||
"execution_count": 168,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"thresh_adj_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 169,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n",
|
||
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"---\n",
|
||
"\n",
|
||
"# Merkmal 3: ErledigungsBeschreibung"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"feature = 'ErledigungsBeschreibung'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"base = wo_duplicates.copy()\n",
|
||
"base = base.dropna(axis=0, subset=feature)\n",
|
||
"base[feature] = base[feature].map(clean_string)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>58</td>\n",
|
||
" <td>257</td>\n",
|
||
" <td>107, Webmaschine, OM 220 EOS</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Gegengewicht wieder anbringen</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schraube ausgebohrt Gegengewicht wieder angebr...</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" <td>2019-03-21</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>81</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>00138, Schärmaschine 9,</td>\n",
|
||
" <td>16</td>\n",
|
||
" <td>Schärmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>82</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Warenschau allgemein</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Klappbügel Portalkran H31 defekt</td>\n",
|
||
" <td>Warenschau allgemein</td>\n",
|
||
" <td>Allgemeine Reparaturarbeiten</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Feder ausgetauscht</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>Warenschau</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>76</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Neben der Türe</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-03-22</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Schraube nix mer gut</td>\n",
|
||
" <td>Neben der Türe</td>\n",
|
||
" <td>Kettbaum</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>Schrauben ausgebohrt Gewinde nachgeschnitten</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>Vorwerk</td>\n",
|
||
" <td>2019-03-25</td>\n",
|
||
" <td>2019-03-22</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>111</td>\n",
|
||
" <td>241</td>\n",
|
||
" <td>294 C, Webmaschine, SG 240 EMS</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Greifer-Webmaschine</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Reparaturauftrag (Portal)</td>\n",
|
||
" <td>2019-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>KBK tauschen\\nUrsache vermutlich mechanisch</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Kupplung-Brems-Kombination</td>\n",
|
||
" <td>2019-04-08</td>\n",
|
||
" <td>Reparatur UTT</td>\n",
|
||
" <td>da derzeit Keine Ersatzteile da Reparatur mit ...</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>Weberei</td>\n",
|
||
" <td>2019-04-02</td>\n",
|
||
" <td>2019-04-01</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
|
||
"3 58 257 107, Webmaschine, OM 220 EOS 3 \n",
|
||
"4 81 138 00138, Schärmaschine 9, 16 \n",
|
||
"5 82 0 Warenschau allgemein 0 \n",
|
||
"6 76 0 Neben der Türe 0 \n",
|
||
"8 111 241 294 C, Webmaschine, SG 240 EMS 5 \n",
|
||
"\n",
|
||
" ObjektArtText VorgangsTypID VorgangsTypName \\\n",
|
||
"3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"4 Schärmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"5 NaN 3 Reparaturauftrag (Portal) \n",
|
||
"6 NaN 3 Reparaturauftrag (Portal) \n",
|
||
"8 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
|
||
"\n",
|
||
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"3 2019-03-21 5 0 \n",
|
||
"4 2019-03-25 5 0 \n",
|
||
"5 2019-03-25 5 0 \n",
|
||
"6 2019-03-22 5 0 \n",
|
||
"8 2019-04-01 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"3 Gegengewicht wieder anbringen NaN \n",
|
||
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
|
||
"5 Klappbügel Portalkran H31 defekt Warenschau allgemein \n",
|
||
"6 Schraube nix mer gut Neben der Türe \n",
|
||
"8 KBK tauschen\\nUrsache vermutlich mechanisch NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
|
||
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
|
||
"5 Allgemeine Reparaturarbeiten 2019-03-25 \n",
|
||
"6 Kettbaum 2019-03-25 \n",
|
||
"8 Kupplung-Brems-Kombination 2019-04-08 \n",
|
||
"\n",
|
||
" ErledigungsArtText ErledigungsBeschreibung \\\n",
|
||
"3 Reparatur UTT Schraube ausgebohrt Gegengewicht wieder angebr... \n",
|
||
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
|
||
"5 Reparatur UTT Feder ausgetauscht \n",
|
||
"6 Reparatur UTT Schrauben ausgebohrt Gewinde nachgeschnitten \n",
|
||
"8 Reparatur UTT da derzeit Keine Ersatzteile da Reparatur mit ... \n",
|
||
"\n",
|
||
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
|
||
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 \n",
|
||
"5 Warenschau Warenschau 2019-03-25 2019-03-25 \n",
|
||
"6 Vorwerk Vorwerk 2019-03-25 2019-03-22 \n",
|
||
"8 Weberei Weberei 2019-04-02 2019-04-01 "
|
||
]
|
||
},
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"base.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 75,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Einträge: 118086\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"descriptions = base[feature]\n",
|
||
"print(f\"Einträge: {len(descriptions)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Duplikate ErledigungsBeschreibung: 110707\n",
|
||
"Anzahl einzigartiger ErledigungsBeschreibung: 7379\n",
|
||
"Anteil einzigartiger ErledigungsBeschreibung: 6.25 %\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"num_dupl_descr = descriptions.duplicated().sum()\n",
|
||
"uni_descr = descriptions.unique()\n",
|
||
"num_uni_descr = len(uni_descr)\n",
|
||
"\n",
|
||
"print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n",
|
||
"print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n",
|
||
"print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 77,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"False"
|
||
]
|
||
},
|
||
"execution_count": 77,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"LOAD_CALC_FILES"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 78,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"if not LOAD_CALC_FILES:\n",
|
||
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
|
||
" descr_df = pd.DataFrame(columns=cols)\n",
|
||
" max_val = 0\n",
|
||
" text = None\n",
|
||
" index = 0\n",
|
||
"\n",
|
||
"\n",
|
||
" for idx, description in enumerate(uni_descr):\n",
|
||
" len_descr = len(description)\n",
|
||
" filt = base[feature] == description\n",
|
||
" temp = base[filt]\n",
|
||
" assoc_obj_ids = temp['ObjektID'].unique()\n",
|
||
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
|
||
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
|
||
" num_dupl = filt.sum()\n",
|
||
" \n",
|
||
" conc_df = pd.DataFrame(data=[[\n",
|
||
" description,\n",
|
||
" len_descr,\n",
|
||
" num_dupl,\n",
|
||
" assoc_obj_ids,\n",
|
||
" num_assoc_obj_ids\n",
|
||
" ]], columns=cols)\n",
|
||
" \n",
|
||
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
|
||
" \n",
|
||
" if num_dupl > max_val:\n",
|
||
" max_val = num_dupl\n",
|
||
" index = idx\n",
|
||
" text = description\n",
|
||
" \n",
|
||
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>len</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" <th>assoc_obj_ids</th>\n",
|
||
" <th>num_assoc_obj_ids</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>112</th>\n",
|
||
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>98720</td>\n",
|
||
" <td>[0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ...</td>\n",
|
||
" <td>953</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>108</th>\n",
|
||
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>1450</td>\n",
|
||
" <td>[0, 1, 140, 301, 305, 313, 314, 576, 970, 1110...</td>\n",
|
||
" <td>28</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>147</th>\n",
|
||
" <td>Externe Prüfung wurde durchgeführt Beanstandun...</td>\n",
|
||
" <td>119</td>\n",
|
||
" <td>1082</td>\n",
|
||
" <td>[191, 193, 195, 197, 200, 264, 287, 288, 289, ...</td>\n",
|
||
" <td>413</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128</th>\n",
|
||
" <td>Reinigung durchgeführt Auffälligkeiten festges...</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>762</td>\n",
|
||
" <td>[0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3...</td>\n",
|
||
" <td>90</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>96</th>\n",
|
||
" <td>Sichtkontrolle wie festgelegt durchgeführt Auf...</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>648</td>\n",
|
||
" <td>[1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66...</td>\n",
|
||
" <td>271</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2805</th>\n",
|
||
" <td>X Achse Süd Führungswägen Kurze Version eingebaut</td>\n",
|
||
" <td>49</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[21]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2804</th>\n",
|
||
" <td>Maschinenrahmen ausgerichtet und ausgebeult. M...</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[144]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2803</th>\n",
|
||
" <td>Bügel und Stützräder getauscht</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[315]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2802</th>\n",
|
||
" <td>Graf: TK wurde in Arbeitsauftrag 65487 gewandelt</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[405]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7378</th>\n",
|
||
" <td>Neue Gasfeder eingebaut</td>\n",
|
||
" <td>23</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>[326]</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>7379 rows × 5 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr len num_occur \\\n",
|
||
"112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 95 98720 \n",
|
||
"108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 100 1450 \n",
|
||
"147 Externe Prüfung wurde durchgeführt Beanstandun... 119 1082 \n",
|
||
"128 Reinigung durchgeführt Auffälligkeiten festges... 90 762 \n",
|
||
"96 Sichtkontrolle wie festgelegt durchgeführt Auf... 110 648 \n",
|
||
"... ... ... ... \n",
|
||
"2805 X Achse Süd Führungswägen Kurze Version eingebaut 49 1 \n",
|
||
"2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 90 1 \n",
|
||
"2803 Bügel und Stützräder getauscht 30 1 \n",
|
||
"2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 48 1 \n",
|
||
"7378 Neue Gasfeder eingebaut 23 1 \n",
|
||
"\n",
|
||
" assoc_obj_ids num_assoc_obj_ids \n",
|
||
"112 [0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ... 953 \n",
|
||
"108 [0, 1, 140, 301, 305, 313, 314, 576, 970, 1110... 28 \n",
|
||
"147 [191, 193, 195, 197, 200, 264, 287, 288, 289, ... 413 \n",
|
||
"128 [0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3... 90 \n",
|
||
"96 [1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66... 271 \n",
|
||
"... ... ... \n",
|
||
"2805 [21] 1 \n",
|
||
"2804 [144] 1 \n",
|
||
"2803 [315] 1 \n",
|
||
"2802 [405] 1 \n",
|
||
"7378 [326] 1 \n",
|
||
"\n",
|
||
"[7379 rows x 5 columns]"
|
||
]
|
||
},
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 81,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben:'"
|
||
]
|
||
},
|
||
"execution_count": 81,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1.iat[0,0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 82,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben: Nein'"
|
||
]
|
||
},
|
||
"execution_count": 82,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp1.iat[1,0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 83,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# save/load dataframe\n",
|
||
"FILE_PATH = f'{feature}_analyse_1.fth'\n",
|
||
"if LOAD_CALC_FILES:\n",
|
||
" temp1 = pd.read_feather(FILE_PATH)\n",
|
||
" temp1 = temp1.set_index('index')\n",
|
||
"else:\n",
|
||
" save_df = temp1.reset_index()\n",
|
||
" save_df.to_feather(FILE_PATH)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Gesamter Datensatz"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 84,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# analysiere erste 10 Einträge\n",
|
||
"descr = temp1[['descr', 'num_occur']]\n",
|
||
"#descr = descr.iloc[50:200,:]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 85,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 86,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"7379"
|
||
]
|
||
},
|
||
"execution_count": 86,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(descr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 87,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>descr</th>\n",
|
||
" <th>num_occur</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>112</th>\n",
|
||
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
|
||
" <td>98720</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>108</th>\n",
|
||
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
|
||
" <td>1450</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>147</th>\n",
|
||
" <td>Externe Prüfung wurde durchgeführt Beanstandun...</td>\n",
|
||
" <td>1082</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>128</th>\n",
|
||
" <td>Reinigung durchgeführt Auffälligkeiten festges...</td>\n",
|
||
" <td>762</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>96</th>\n",
|
||
" <td>Sichtkontrolle wie festgelegt durchgeführt Auf...</td>\n",
|
||
" <td>648</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2805</th>\n",
|
||
" <td>X Achse Süd Führungswägen Kurze Version eingebaut</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2804</th>\n",
|
||
" <td>Maschinenrahmen ausgerichtet und ausgebeult. M...</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2803</th>\n",
|
||
" <td>Bügel und Stützräder getauscht</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2802</th>\n",
|
||
" <td>Graf: TK wurde in Arbeitsauftrag 65487 gewandelt</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7378</th>\n",
|
||
" <td>Neue Gasfeder eingebaut</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>7379 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" descr num_occur\n",
|
||
"112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 98720\n",
|
||
"108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 1450\n",
|
||
"147 Externe Prüfung wurde durchgeführt Beanstandun... 1082\n",
|
||
"128 Reinigung durchgeführt Auffälligkeiten festges... 762\n",
|
||
"96 Sichtkontrolle wie festgelegt durchgeführt Auf... 648\n",
|
||
"... ... ...\n",
|
||
"2805 X Achse Süd Führungswägen Kurze Version eingebaut 1\n",
|
||
"2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 1\n",
|
||
"2803 Bügel und Stützräder getauscht 1\n",
|
||
"2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 1\n",
|
||
"7378 Neue Gasfeder eingebaut 1\n",
|
||
"\n",
|
||
"[7379 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 87,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"descr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 88,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#LOAD_CALC_FILES = True\n",
|
||
"#LOAD_CALC_FILES = False\n",
|
||
"#IS_TEST = True\n",
|
||
"IS_TEST = False"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 89,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"INFO:base:Number of entries processed: 1, Percent completed: 0.01\n",
|
||
"INFO:base:Number of entries processed: 501, Percent completed: 6.79\n",
|
||
"INFO:base:Number of entries processed: 1001, Percent completed: 13.57\n",
|
||
"INFO:base:Number of entries processed: 1501, Percent completed: 20.34\n",
|
||
"INFO:base:Number of entries processed: 2001, Percent completed: 27.12\n",
|
||
"INFO:base:Number of entries processed: 2501, Percent completed: 33.89\n",
|
||
"INFO:base:Number of entries processed: 3001, Percent completed: 40.67\n",
|
||
"INFO:base:Number of entries processed: 3501, Percent completed: 47.45\n",
|
||
"INFO:base:Number of entries processed: 4001, Percent completed: 54.22\n",
|
||
"INFO:base:Number of entries processed: 4501, Percent completed: 61.00\n",
|
||
"INFO:base:Number of entries processed: 5001, Percent completed: 67.77\n",
|
||
"INFO:base:Number of entries processed: 5501, Percent completed: 74.55\n",
|
||
"INFO:base:Number of entries processed: 6001, Percent completed: 81.33\n",
|
||
"INFO:base:Number of entries processed: 6501, Percent completed: 88.10\n",
|
||
"INFO:base:Number of entries processed: 7001, Percent completed: 94.88\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# adjacency matrix\n",
|
||
"connections = dict()\n",
|
||
"unique_tokens = set()\n",
|
||
"UPDATE_STATUS = 500\n",
|
||
"length_data = len(descr)\n",
|
||
"spell_check_candidates = set()\n",
|
||
"spell_checker = SpellChecker(language='de', distance=1)\n",
|
||
"\n",
|
||
"if not LOAD_CALC_FILES or IS_TEST:\n",
|
||
" for count, description in enumerate(descr.iterrows()):\n",
|
||
" \n",
|
||
" text = description[1]['descr']\n",
|
||
" weight = description[1]['num_occur']\n",
|
||
" \n",
|
||
" doc = nlp(text)\n",
|
||
" \n",
|
||
" obtain_descendant_info(\n",
|
||
" doc=doc,\n",
|
||
" weight=weight,\n",
|
||
" POS_of_interest=POS_of_interest,\n",
|
||
" TAG_of_interest=TAG_of_interest,\n",
|
||
" connections=connections,\n",
|
||
" unique_tokens=unique_tokens,\n",
|
||
" spell_check_candidates=spell_check_candidates,\n",
|
||
" spell_check_whitelist=spell_check_whitelist,\n",
|
||
" spell_checker=spell_checker,\n",
|
||
" corrections=corrections,\n",
|
||
" )\n",
|
||
" \n",
|
||
" if count % UPDATE_STATUS == 0:\n",
|
||
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 93,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n",
|
||
"if not IS_TEST:\n",
|
||
" if LOAD_CALC_FILES:\n",
|
||
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
|
||
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
|
||
" # additional information\n",
|
||
" connections = load_pickle('connections.pkl')\n",
|
||
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
|
||
" else:\n",
|
||
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
|
||
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
|
||
" save_df = adj_mat_undir.reset_index()\n",
|
||
" save_df.to_feather(ADJ_DF_PATH)\n",
|
||
" # additional information\n",
|
||
" save_pickle(obj=connections, path='connections.pkl')\n",
|
||
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 94,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>funktionsfähig</th>\n",
|
||
" <th>Zwischenbehälter</th>\n",
|
||
" <th>Ölfilter</th>\n",
|
||
" <th>Rechter</th>\n",
|
||
" <th>Kontaktproblem</th>\n",
|
||
" <th>Geschweisst</th>\n",
|
||
" <th>vorbereiten</th>\n",
|
||
" <th>Gelenkbolzen</th>\n",
|
||
" <th>Silikonfass</th>\n",
|
||
" <th>Ausbau</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Kom</th>\n",
|
||
" <th>anlernen</th>\n",
|
||
" <th>nah</th>\n",
|
||
" <th>Begutachtung</th>\n",
|
||
" <th>Betriebszeit</th>\n",
|
||
" <th>paletten</th>\n",
|
||
" <th>augetreten</th>\n",
|
||
" <th>Antriebszahnrad</th>\n",
|
||
" <th>Gewindereparaturset</th>\n",
|
||
" <th>Heizventil</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>-20C</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Befestihgung</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Einlaufwalze</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Entlüftungssicherung</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Faltbalken</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überzogenn</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>überzoggen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>übrtprüfen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ünerziehen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>üperprüfen</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6946 rows × 6946 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n",
|
||
"-20C 0 0 0 0 \n",
|
||
"-Befestihgung 0 0 0 0 \n",
|
||
"-Einlaufwalze 0 0 0 0 \n",
|
||
"-Entlüftungssicherung 0 0 0 0 \n",
|
||
"-Faltbalken 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"überzogenn 0 0 0 0 \n",
|
||
"überzoggen 0 0 0 0 \n",
|
||
"übrtprüfen 0 0 0 0 \n",
|
||
"ünerziehen 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 \n",
|
||
"\n",
|
||
" Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n",
|
||
"-20C 0 0 0 0 \n",
|
||
"-Befestihgung 0 0 0 0 \n",
|
||
"-Einlaufwalze 0 0 0 0 \n",
|
||
"-Entlüftungssicherung 0 0 0 0 \n",
|
||
"-Faltbalken 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"überzogenn 0 0 0 0 \n",
|
||
"überzoggen 0 0 0 0 \n",
|
||
"übrtprüfen 0 0 0 0 \n",
|
||
"ünerziehen 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 \n",
|
||
"\n",
|
||
" Silikonfass Ausbau ... Kom anlernen nah \\\n",
|
||
"-20C 0 0 ... 0 0 0 \n",
|
||
"-Befestihgung 0 0 ... 0 0 0 \n",
|
||
"-Einlaufwalze 0 0 ... 0 0 0 \n",
|
||
"-Entlüftungssicherung 0 0 ... 0 0 0 \n",
|
||
"-Faltbalken 0 0 ... 0 0 0 \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"überzogenn 0 0 ... 0 0 0 \n",
|
||
"überzoggen 0 0 ... 0 0 0 \n",
|
||
"übrtprüfen 0 0 ... 0 0 0 \n",
|
||
"ünerziehen 0 0 ... 0 0 0 \n",
|
||
"üperprüfen 0 0 ... 0 0 0 \n",
|
||
"\n",
|
||
" Begutachtung Betriebszeit paletten augetreten \\\n",
|
||
"-20C 0 0 0 0 \n",
|
||
"-Befestihgung 0 0 0 0 \n",
|
||
"-Einlaufwalze 0 0 0 0 \n",
|
||
"-Entlüftungssicherung 0 0 0 0 \n",
|
||
"-Faltbalken 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"überzogenn 0 0 0 0 \n",
|
||
"überzoggen 0 0 0 0 \n",
|
||
"übrtprüfen 0 0 0 0 \n",
|
||
"ünerziehen 0 0 0 0 \n",
|
||
"üperprüfen 0 0 0 0 \n",
|
||
"\n",
|
||
" Antriebszahnrad Gewindereparaturset Heizventil \n",
|
||
"-20C 0 0 0 \n",
|
||
"-Befestihgung 0 0 0 \n",
|
||
"-Einlaufwalze 0 0 0 \n",
|
||
"-Entlüftungssicherung 0 0 0 \n",
|
||
"-Faltbalken 0 0 0 \n",
|
||
"... ... ... ... \n",
|
||
"überzogenn 0 0 0 \n",
|
||
"überzoggen 0 0 0 \n",
|
||
"übrtprüfen 0 0 0 \n",
|
||
"ünerziehen 0 0 0 \n",
|
||
"üperprüfen 0 0 0 \n",
|
||
"\n",
|
||
"[6946 rows x 6946 columns]"
|
||
]
|
||
},
|
||
"execution_count": 94,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"adj_mat_undir.sort_index()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 95,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = adj_mat_undir.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 96,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"24171"
|
||
]
|
||
},
|
||
"execution_count": 96,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 97,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"103601"
|
||
]
|
||
},
|
||
"execution_count": 97,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.max(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Threshold"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 110,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"WEIGHT_THRESHOLD = 30"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 111,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = adj_mat_undir.to_numpy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 112,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 113,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"138"
|
||
]
|
||
},
|
||
"execution_count": 113,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"np.count_nonzero(arr)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 116,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"thresh_adj_mat = adj_mat_undir.copy()\n",
|
||
"thresh_adj_mat.loc[:] = arr"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 117,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>funktionsfähig</th>\n",
|
||
" <th>Zwischenbehälter</th>\n",
|
||
" <th>Ölfilter</th>\n",
|
||
" <th>Rechter</th>\n",
|
||
" <th>Kontaktproblem</th>\n",
|
||
" <th>Geschweisst</th>\n",
|
||
" <th>vorbereiten</th>\n",
|
||
" <th>Gelenkbolzen</th>\n",
|
||
" <th>Silikonfass</th>\n",
|
||
" <th>Ausbau</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>Kom</th>\n",
|
||
" <th>anlernen</th>\n",
|
||
" <th>nah</th>\n",
|
||
" <th>Begutachtung</th>\n",
|
||
" <th>Betriebszeit</th>\n",
|
||
" <th>paletten</th>\n",
|
||
" <th>augetreten</th>\n",
|
||
" <th>Antriebszahnrad</th>\n",
|
||
" <th>Gewindereparaturset</th>\n",
|
||
" <th>Heizventil</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>funktionsfähig</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Zwischenbehälter</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Ölfilter</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Rechter</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Kontaktproblem</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>paletten</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>augetreten</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antriebszahnrad</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Gewindereparaturset</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Heizventil</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>6946 rows × 6946 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n",
|
||
"funktionsfähig 0 0 0 0 \n",
|
||
"Zwischenbehälter 0 0 0 0 \n",
|
||
"Ölfilter 0 0 0 0 \n",
|
||
"Rechter 0 0 0 0 \n",
|
||
"Kontaktproblem 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"paletten 0 0 0 0 \n",
|
||
"augetreten 0 0 0 0 \n",
|
||
"Antriebszahnrad 0 0 0 0 \n",
|
||
"Gewindereparaturset 0 0 0 0 \n",
|
||
"Heizventil 0 0 0 0 \n",
|
||
"\n",
|
||
" Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n",
|
||
"funktionsfähig 0 0 0 0 \n",
|
||
"Zwischenbehälter 0 0 0 0 \n",
|
||
"Ölfilter 0 0 0 0 \n",
|
||
"Rechter 0 0 0 0 \n",
|
||
"Kontaktproblem 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"paletten 0 0 0 0 \n",
|
||
"augetreten 0 0 0 0 \n",
|
||
"Antriebszahnrad 0 0 0 0 \n",
|
||
"Gewindereparaturset 0 0 0 0 \n",
|
||
"Heizventil 0 0 0 0 \n",
|
||
"\n",
|
||
" Silikonfass Ausbau ... Kom anlernen nah \\\n",
|
||
"funktionsfähig 0 0 ... 0 0 0 \n",
|
||
"Zwischenbehälter 0 0 ... 0 0 0 \n",
|
||
"Ölfilter 0 0 ... 0 0 0 \n",
|
||
"Rechter 0 0 ... 0 0 0 \n",
|
||
"Kontaktproblem 0 0 ... 0 0 0 \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"paletten 0 0 ... 0 0 0 \n",
|
||
"augetreten 0 0 ... 0 0 0 \n",
|
||
"Antriebszahnrad 0 0 ... 0 0 0 \n",
|
||
"Gewindereparaturset 0 0 ... 0 0 0 \n",
|
||
"Heizventil 0 0 ... 0 0 0 \n",
|
||
"\n",
|
||
" Begutachtung Betriebszeit paletten augetreten \\\n",
|
||
"funktionsfähig 0 0 0 0 \n",
|
||
"Zwischenbehälter 0 0 0 0 \n",
|
||
"Ölfilter 0 0 0 0 \n",
|
||
"Rechter 0 0 0 0 \n",
|
||
"Kontaktproblem 0 0 0 0 \n",
|
||
"... ... ... ... ... \n",
|
||
"paletten 0 0 0 0 \n",
|
||
"augetreten 0 0 0 0 \n",
|
||
"Antriebszahnrad 0 0 0 0 \n",
|
||
"Gewindereparaturset 0 0 0 0 \n",
|
||
"Heizventil 0 0 0 0 \n",
|
||
"\n",
|
||
" Antriebszahnrad Gewindereparaturset Heizventil \n",
|
||
"funktionsfähig 0 0 0 \n",
|
||
"Zwischenbehälter 0 0 0 \n",
|
||
"Ölfilter 0 0 0 \n",
|
||
"Rechter 0 0 0 \n",
|
||
"Kontaktproblem 0 0 0 \n",
|
||
"... ... ... ... \n",
|
||
"paletten 0 0 0 \n",
|
||
"augetreten 0 0 0 \n",
|
||
"Antriebszahnrad 0 0 0 \n",
|
||
"Gewindereparaturset 0 0 0 \n",
|
||
"Heizventil 0 0 0 \n",
|
||
"\n",
|
||
"[6946 rows x 6946 columns]"
|
||
]
|
||
},
|
||
"execution_count": 117,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"thresh_adj_mat"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 118,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n",
|
||
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"---\n",
|
||
"# **Zusatz**\n",
|
||
"\n",
|
||
"#### **Analysiere beispielhaft Eintrag mit meisten Duplikaten**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Einträge mit gewählter Beschreibung: 47689\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"crit = uni_descr[171]\n",
|
||
"filt = wo_duplicates['VorgangsBeschreibung'] == crit\n",
|
||
"temp = wo_duplicates[filt]\n",
|
||
"print(f\"Anzahl Einträge mit gewählter Beschreibung: {len(temp)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>288</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>187</td>\n",
|
||
" <td>246, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>289</th>\n",
|
||
" <td>152507</td>\n",
|
||
" <td>177</td>\n",
|
||
" <td>204 S SI , Webmaschine, DL 280 EMS Breite 220</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-09</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-09</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-09</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>318</th>\n",
|
||
" <td>255972</td>\n",
|
||
" <td>249</td>\n",
|
||
" <td>203 C S SI, Webmaschine, DL 280 EMS Breite 220</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-07-30</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-07-30</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-07-30</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>319</th>\n",
|
||
" <td>255977</td>\n",
|
||
" <td>249</td>\n",
|
||
" <td>203 C S SI, Webmaschine, DL 280 EMS Breite 220</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-08-04</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-08-04</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-08-04</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>340</th>\n",
|
||
" <td>267942</td>\n",
|
||
" <td>187</td>\n",
|
||
" <td>246, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-08-07</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-08-07</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-08-07</td>\n",
|
||
" <td>2022-08-05</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText \\\n",
|
||
"288 155717 187 246, Webmaschine Jacquard, \n",
|
||
"289 152507 177 204 S SI , Webmaschine, DL 280 EMS Breite 220 \n",
|
||
"318 255972 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n",
|
||
"319 255977 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n",
|
||
"340 267942 187 246, Webmaschine Jacquard, \n",
|
||
"\n",
|
||
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
|
||
"288 6 Jacquard-Webmaschine 1 Wartung \n",
|
||
"289 3 Luft-Webmaschine 1 Wartung \n",
|
||
"318 3 Luft-Webmaschine 1 Wartung \n",
|
||
"319 3 Luft-Webmaschine 1 Wartung \n",
|
||
"340 6 Jacquard-Webmaschine 1 Wartung \n",
|
||
"\n",
|
||
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"288 2022-04-01 5 0 \n",
|
||
"289 2022-04-09 5 0 \n",
|
||
"318 2022-07-30 5 0 \n",
|
||
"319 2022-08-04 5 0 \n",
|
||
"340 2022-08-07 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"289 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"318 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"319 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"340 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"289 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-09 \n",
|
||
"318 Tägliche Interne Wartungstätigkeiten Weberei 2022-07-30 \n",
|
||
"319 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-04 \n",
|
||
"340 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-07 \n",
|
||
"\n",
|
||
" ErledigungsArtText \\\n",
|
||
"288 Intern UTT - Sichtkontrolle \n",
|
||
"289 Intern UTT - Sichtkontrolle \n",
|
||
"318 Intern UTT - Sichtkontrolle \n",
|
||
"319 Intern UTT - Sichtkontrolle \n",
|
||
"340 Intern UTT - Sichtkontrolle \n",
|
||
"\n",
|
||
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
|
||
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"289 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"318 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"319 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"340 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"\n",
|
||
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"288 NaN 2022-04-01 2022-02-17 \n",
|
||
"289 NaN 2022-04-09 2022-02-17 \n",
|
||
"318 NaN 2022-07-30 2022-04-28 \n",
|
||
"319 NaN 2022-08-04 2022-04-28 \n",
|
||
"340 NaN 2022-08-07 2022-08-05 "
|
||
]
|
||
},
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# schaue welche Merkmale abweichend sind\n",
|
||
"analyse_columns = ['ObjektID', 'VorgangsTypID', 'VorgangsTypName']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"ObjektID"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([ 187, 177, 249, 2654, 1792, 272, 271, 270, 269, 268, 186,\n",
|
||
" 178, 179, 2317, 2318, 2473, 2559, 1244, 240, 241, 180, 220,\n",
|
||
" 221, 222, 223, 224, 961, 962, 2166, 3212, 267, 266, 181,\n",
|
||
" 182, 213, 214, 174, 175, 176, 156, 157, 158, 247, 248,\n",
|
||
" 183, 265, 278, 1793, 1794, 218, 217, 219, 215, 216, 2319,\n",
|
||
" 2320, 228, 184, 152, 153, 2165, 154, 155, 159, 167, 168,\n",
|
||
" 169, 2313, 2314, 2315, 2316, 212, 211, 160, 161, 162, 164,\n",
|
||
" 165, 166, 264, 273, 274, 277, 276, 275, 279, 280, 281,\n",
|
||
" 282, 283, 242, 243, 244, 245, 246, 225, 227, 229, 170,\n",
|
||
" 171, 172, 173, 230, 231, 3213, 3211, 3214], dtype=int64)"
|
||
]
|
||
},
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp['ObjektID'].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"filt = temp['ObjektID'] == 2318\n",
|
||
"temp_fil1 = temp[filt]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>878</th>\n",
|
||
" <td>269743</td>\n",
|
||
" <td>2318</td>\n",
|
||
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-10-31</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-10-31</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-10-31</td>\n",
|
||
" <td>2022-08-05</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6099</th>\n",
|
||
" <td>152490</td>\n",
|
||
" <td>2318</td>\n",
|
||
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-03-24</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-03-24</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-03-24</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13905</th>\n",
|
||
" <td>152476</td>\n",
|
||
" <td>2318</td>\n",
|
||
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-03-10</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-03-10</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-03-10</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14019</th>\n",
|
||
" <td>248301</td>\n",
|
||
" <td>2318</td>\n",
|
||
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" <td>2022-04-14</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14211</th>\n",
|
||
" <td>254914</td>\n",
|
||
" <td>2318</td>\n",
|
||
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Luft-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-05-19</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-05-19</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-05-19</td>\n",
|
||
" <td>2022-04-28</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText \\\n",
|
||
"878 269743 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"6099 152490 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"13905 152476 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"14019 248301 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"14211 254914 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
|
||
"\n",
|
||
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
|
||
"878 3 Luft-Webmaschine 1 Wartung \n",
|
||
"6099 3 Luft-Webmaschine 1 Wartung \n",
|
||
"13905 3 Luft-Webmaschine 1 Wartung \n",
|
||
"14019 3 Luft-Webmaschine 1 Wartung \n",
|
||
"14211 3 Luft-Webmaschine 1 Wartung \n",
|
||
"\n",
|
||
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"878 2022-10-31 5 0 \n",
|
||
"6099 2022-03-24 5 0 \n",
|
||
"13905 2022-03-10 5 0 \n",
|
||
"14019 2022-04-28 5 0 \n",
|
||
"14211 2022-05-19 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"878 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"6099 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"13905 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"14019 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"14211 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"878 Tägliche Interne Wartungstätigkeiten Weberei 2022-10-31 \n",
|
||
"6099 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-24 \n",
|
||
"13905 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-10 \n",
|
||
"14019 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-28 \n",
|
||
"14211 Tägliche Interne Wartungstätigkeiten Weberei 2022-05-19 \n",
|
||
"\n",
|
||
" ErledigungsArtText \\\n",
|
||
"878 Intern UTT - Sichtkontrolle \n",
|
||
"6099 Intern UTT - Sichtkontrolle \n",
|
||
"13905 Intern UTT - Sichtkontrolle \n",
|
||
"14019 Intern UTT - Sichtkontrolle \n",
|
||
"14211 Intern UTT - Sichtkontrolle \n",
|
||
"\n",
|
||
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
|
||
"878 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"6099 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"13905 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"14019 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"14211 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"\n",
|
||
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"878 NaN 2022-10-31 2022-08-05 \n",
|
||
"6099 NaN 2022-03-24 2022-02-17 \n",
|
||
"13905 NaN 2022-03-10 2022-02-17 \n",
|
||
"14019 NaN 2022-04-28 2022-04-14 \n",
|
||
"14211 NaN 2022-05-19 2022-04-28 "
|
||
]
|
||
},
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil1.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<DatetimeArray>\n",
|
||
"['2022-10-31 00:00:00', '2022-03-24 00:00:00', '2022-03-10 00:00:00',\n",
|
||
" '2022-04-28 00:00:00', '2022-05-19 00:00:00', '2022-04-09 00:00:00',\n",
|
||
" '2022-04-21 00:00:00', '2022-06-11 00:00:00', '2022-05-12 00:00:00',\n",
|
||
" '2022-04-23 00:00:00',\n",
|
||
" ...\n",
|
||
" '2022-10-28 00:00:00', '2022-07-06 00:00:00', '2023-06-14 00:00:00',\n",
|
||
" '2022-10-29 00:00:00', '2022-07-07 00:00:00', '2023-06-15 00:00:00',\n",
|
||
" '2022-05-05 00:00:00', '2022-10-30 00:00:00', '2022-07-08 00:00:00',\n",
|
||
" '2022-10-19 00:00:00']\n",
|
||
"Length: 462, dtype: datetime64[ns]"
|
||
]
|
||
},
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil1['VorgangsDatum'].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"462"
|
||
]
|
||
},
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(temp_fil1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"VorgangsID"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl einzigartiger VorgangsID 1855 mit Anteil am Gesamtdatensatz 3.89 %\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"uni_VorgangsID = temp['VorgangsID'].unique()\n",
|
||
"num_uni_VorgangsID = len(uni_VorgangsID)\n",
|
||
"print(f'Anzahl einzigartiger VorgangsID {num_uni_VorgangsID} mit Anteil am Gesamtdatensatz {num_uni_VorgangsID / len(temp) * 100:.2f} %')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"155717"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"uni_VorgangsID[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"filt = temp['VorgangsID'] == uni_VorgangsID[0]\n",
|
||
"temp_fil1 = temp[filt]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>288</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>187</td>\n",
|
||
" <td>246, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2718</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1792</td>\n",
|
||
" <td>A057, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2719</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>186</td>\n",
|
||
" <td>245 J, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2720</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2473</td>\n",
|
||
" <td>A056, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5504</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2559</td>\n",
|
||
" <td>A070, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5505</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>961</td>\n",
|
||
" <td>A054, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5506</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>962</td>\n",
|
||
" <td>A055, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5507</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2166</td>\n",
|
||
" <td>A061, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5508</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1793</td>\n",
|
||
" <td>A058, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5509</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1794</td>\n",
|
||
" <td>A059, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8294</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2165</td>\n",
|
||
" <td>A060, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
|
||
"288 155717 187 246, Webmaschine Jacquard, 6 \n",
|
||
"2718 155717 1792 A057, Webmaschine Jacquard, 6 \n",
|
||
"2719 155717 186 245 J, Webmaschine Jacquard, 6 \n",
|
||
"2720 155717 2473 A056, Webmaschine Jacquard, 6 \n",
|
||
"5504 155717 2559 A070, Webmaschine Jacquard, 6 \n",
|
||
"5505 155717 961 A054, Webmaschine Jacquard, 6 \n",
|
||
"5506 155717 962 A055, Webmaschine Jacquard, 6 \n",
|
||
"5507 155717 2166 A061, Webmaschine Jacquard, 6 \n",
|
||
"5508 155717 1793 A058, Webmaschine Jacquard, 6 \n",
|
||
"5509 155717 1794 A059, Webmaschine Jacquard, 6 \n",
|
||
"8294 155717 2165 A060, Webmaschine Jacquard, 6 \n",
|
||
"\n",
|
||
" ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
|
||
"288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"\n",
|
||
" VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"288 5 0 \n",
|
||
"2718 5 0 \n",
|
||
"2719 5 0 \n",
|
||
"2720 5 0 \n",
|
||
"5504 5 0 \n",
|
||
"5505 5 0 \n",
|
||
"5506 5 0 \n",
|
||
"5507 5 0 \n",
|
||
"5508 5 0 \n",
|
||
"5509 5 0 \n",
|
||
"8294 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"\n",
|
||
" ErledigungsArtText \\\n",
|
||
"288 Intern UTT - Sichtkontrolle \n",
|
||
"2718 Intern UTT - Sichtkontrolle \n",
|
||
"2719 Intern UTT - Sichtkontrolle \n",
|
||
"2720 Intern UTT - Sichtkontrolle \n",
|
||
"5504 Intern UTT - Sichtkontrolle \n",
|
||
"5505 Intern UTT - Sichtkontrolle \n",
|
||
"5506 Intern UTT - Sichtkontrolle \n",
|
||
"5507 Intern UTT - Sichtkontrolle \n",
|
||
"5508 Intern UTT - Sichtkontrolle \n",
|
||
"5509 Intern UTT - Sichtkontrolle \n",
|
||
"8294 Intern UTT - Sichtkontrolle \n",
|
||
"\n",
|
||
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
|
||
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
|
||
"\n",
|
||
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"288 NaN 2022-04-01 2022-02-17 \n",
|
||
"2718 NaN 2022-04-01 2022-02-17 \n",
|
||
"2719 NaN 2022-04-01 2022-02-17 \n",
|
||
"2720 NaN 2022-04-01 2022-02-17 \n",
|
||
"5504 NaN 2022-04-01 2022-02-17 \n",
|
||
"5505 NaN 2022-04-01 2022-02-17 \n",
|
||
"5506 NaN 2022-04-01 2022-02-17 \n",
|
||
"5507 NaN 2022-04-01 2022-02-17 \n",
|
||
"5508 NaN 2022-04-01 2022-02-17 \n",
|
||
"5509 NaN 2022-04-01 2022-02-17 \n",
|
||
"8294 NaN 2022-04-01 2022-02-17 "
|
||
]
|
||
},
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Anzahl Einträge mit gewählter VorgangsID: 11\n",
|
||
"Anzahl einzigartiger ObjektIDs darunter: 11\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil2 = temp_fil1.fillna(value=False)\n",
|
||
"print(f'Anzahl Einträge mit gewählter VorgangsID: {len(temp_fil2)}')\n",
|
||
"uni_obj_id = len(temp_fil2['ObjektID'].unique())\n",
|
||
"print(f'Anzahl einzigartiger ObjektIDs darunter: {uni_obj_id}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([ 187, 1792, 186, 2473, 2559, 961, 962, 2166, 1793, 1794, 2165],\n",
|
||
" dtype=int64)"
|
||
]
|
||
},
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil2['ObjektID'].unique()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsID</th>\n",
|
||
" <th>ObjektID</th>\n",
|
||
" <th>HObjektText</th>\n",
|
||
" <th>ObjektArtID</th>\n",
|
||
" <th>ObjektArtText</th>\n",
|
||
" <th>VorgangsTypID</th>\n",
|
||
" <th>VorgangsTypName</th>\n",
|
||
" <th>VorgangsDatum</th>\n",
|
||
" <th>VorgangsStatusId</th>\n",
|
||
" <th>VorgangsPrioritaet</th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>VorgangsOrt</th>\n",
|
||
" <th>VorgangsArtText</th>\n",
|
||
" <th>ErledigungsDatum</th>\n",
|
||
" <th>ErledigungsArtText</th>\n",
|
||
" <th>ErledigungsBeschreibung</th>\n",
|
||
" <th>MPMelderArbeitsplatz</th>\n",
|
||
" <th>MPAbteilungBezeichnung</th>\n",
|
||
" <th>Arbeitsbeginn</th>\n",
|
||
" <th>ErstellungsDatum</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>288</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>187</td>\n",
|
||
" <td>246, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2718</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1792</td>\n",
|
||
" <td>A057, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2719</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>186</td>\n",
|
||
" <td>245 J, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2720</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2473</td>\n",
|
||
" <td>A056, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5504</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2559</td>\n",
|
||
" <td>A070, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5505</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>961</td>\n",
|
||
" <td>A054, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5506</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>962</td>\n",
|
||
" <td>A055, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5507</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2166</td>\n",
|
||
" <td>A061, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5508</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1793</td>\n",
|
||
" <td>A058, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5509</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>1794</td>\n",
|
||
" <td>A059, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8294</th>\n",
|
||
" <td>155717</td>\n",
|
||
" <td>2165</td>\n",
|
||
" <td>A060, Webmaschine Jacquard,</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Jacquard-Webmaschine</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wartung</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>Intern UTT - Sichtkontrolle</td>\n",
|
||
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>2022-04-01</td>\n",
|
||
" <td>2022-02-17</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
|
||
"288 155717 187 246, Webmaschine Jacquard, 6 \n",
|
||
"2718 155717 1792 A057, Webmaschine Jacquard, 6 \n",
|
||
"2719 155717 186 245 J, Webmaschine Jacquard, 6 \n",
|
||
"2720 155717 2473 A056, Webmaschine Jacquard, 6 \n",
|
||
"5504 155717 2559 A070, Webmaschine Jacquard, 6 \n",
|
||
"5505 155717 961 A054, Webmaschine Jacquard, 6 \n",
|
||
"5506 155717 962 A055, Webmaschine Jacquard, 6 \n",
|
||
"5507 155717 2166 A061, Webmaschine Jacquard, 6 \n",
|
||
"5508 155717 1793 A058, Webmaschine Jacquard, 6 \n",
|
||
"5509 155717 1794 A059, Webmaschine Jacquard, 6 \n",
|
||
"8294 155717 2165 A060, Webmaschine Jacquard, 6 \n",
|
||
"\n",
|
||
" ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
|
||
"288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
|
||
"\n",
|
||
" VorgangsStatusId VorgangsPrioritaet \\\n",
|
||
"288 5 0 \n",
|
||
"2718 5 0 \n",
|
||
"2719 5 0 \n",
|
||
"2720 5 0 \n",
|
||
"5504 5 0 \n",
|
||
"5505 5 0 \n",
|
||
"5506 5 0 \n",
|
||
"5507 5 0 \n",
|
||
"5508 5 0 \n",
|
||
"5509 5 0 \n",
|
||
"8294 5 0 \n",
|
||
"\n",
|
||
" VorgangsBeschreibung VorgangsOrt \\\n",
|
||
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
|
||
"\n",
|
||
" VorgangsArtText ErledigungsDatum \\\n",
|
||
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
|
||
"\n",
|
||
" ErledigungsArtText \\\n",
|
||
"288 Intern UTT - Sichtkontrolle \n",
|
||
"2718 Intern UTT - Sichtkontrolle \n",
|
||
"2719 Intern UTT - Sichtkontrolle \n",
|
||
"2720 Intern UTT - Sichtkontrolle \n",
|
||
"5504 Intern UTT - Sichtkontrolle \n",
|
||
"5505 Intern UTT - Sichtkontrolle \n",
|
||
"5506 Intern UTT - Sichtkontrolle \n",
|
||
"5507 Intern UTT - Sichtkontrolle \n",
|
||
"5508 Intern UTT - Sichtkontrolle \n",
|
||
"5509 Intern UTT - Sichtkontrolle \n",
|
||
"8294 Intern UTT - Sichtkontrolle \n",
|
||
"\n",
|
||
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
|
||
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
|
||
"\n",
|
||
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
|
||
"288 False 2022-04-01 2022-02-17 \n",
|
||
"2718 False 2022-04-01 2022-02-17 \n",
|
||
"2719 False 2022-04-01 2022-02-17 \n",
|
||
"2720 False 2022-04-01 2022-02-17 \n",
|
||
"5504 False 2022-04-01 2022-02-17 \n",
|
||
"5505 False 2022-04-01 2022-02-17 \n",
|
||
"5506 False 2022-04-01 2022-02-17 \n",
|
||
"5507 False 2022-04-01 2022-02-17 \n",
|
||
"5508 False 2022-04-01 2022-02-17 \n",
|
||
"5509 False 2022-04-01 2022-02-17 \n",
|
||
"8294 False 2022-04-01 2022-02-17 "
|
||
]
|
||
},
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"temp_fil2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Frage: Können einem Vorgang mehrere ObjektIDs zugeordnet werden? Wenn ja, warum dann unterschiedliche Erledigungsdaten?*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Länge der Beschreibungen**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"descriptions = descriptions.to_frame()\n",
|
||
"descriptions['length_description'] = descriptions.applymap(func=lambda x: len(x))\n",
|
||
"descriptions = descriptions.sort_values(by=['length_description'], ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"count 124008.000000\n",
|
||
"mean 70.351751\n",
|
||
"std 53.080901\n",
|
||
"min 1.000000\n",
|
||
"25% 66.000000\n",
|
||
"50% 66.000000\n",
|
||
"75% 67.000000\n",
|
||
"max 3137.000000\n",
|
||
"Name: length_description, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# stats\n",
|
||
"len_descr = descriptions['length_description']\n",
|
||
"len_descr.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 75,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>length_description</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>8704</th>\n",
|
||
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
|
||
" <td>3137</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7826</th>\n",
|
||
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
|
||
" <td>3137</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>49779</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>124118</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14853</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsBeschreibung length_description\n",
|
||
"8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
|
||
"7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
|
||
"49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
|
||
"124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
|
||
"14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311"
|
||
]
|
||
},
|
||
"execution_count": 75,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"descriptions.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>VorgangsBeschreibung</th>\n",
|
||
" <th>length_description</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>8704</th>\n",
|
||
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
|
||
" <td>3137</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7826</th>\n",
|
||
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
|
||
" <td>3137</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>49779</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>124118</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14853</th>\n",
|
||
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
|
||
" <td>2311</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13450</th>\n",
|
||
" <td></td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13451</th>\n",
|
||
" <td></td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>29979</th>\n",
|
||
" <td></td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13452</th>\n",
|
||
" <td></td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>21214</th>\n",
|
||
" <td>\\n</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>124008 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" VorgangsBeschreibung length_description\n",
|
||
"8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
|
||
"7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
|
||
"49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
|
||
"124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
|
||
"14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
|
||
"... ... ...\n",
|
||
"13450 1\n",
|
||
"13451 1\n",
|
||
"29979 1\n",
|
||
"13452 1\n",
|
||
"21214 \\n 1\n",
|
||
"\n",
|
||
"[124008 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"descriptions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|