lang-main/notebooks/archive/Analyse_2.ipynb
2024-08-07 20:06:06 +02:00

11661 lines
412 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# **Analyse 2**\n",
"\n",
"## Strategie & Fokus\n",
"\n",
"- Konzentration auf Export 4/5 (größste Datensätze)\n",
" - jeder Datensatz gehört zu unterschiedlichem Kunden\n",
" - dadurch: Abweichungen zwischen IDs und assoziierten Beschreibungen; OBjektID mehrfach vergeben\n",
"\n",
"### Merkmal 1 - Vorgansgbeschreibungen:\n",
"\n",
"- Analyse hinsichtlich möglicher Cluster in ``VorgangsBeschreibung``:\n",
" - evtl. Ableitung standardisierter, auswählbarer Beschreibungen\n",
" - typische Begriffe und wiederholendes Auftreten\n",
"- Zusatzinformation über ``VorgangsArtText``:\n",
" - teilweise standardisiert\n",
" - *Verbindung zu ``VorgangsBeschreibung`` semantisch korrekt?*\n",
"- Zusatzinformation ``VorgangsTypName`` mit ``VorgangsTypID``:\n",
" - definitiv standardisiert\n",
" - *Anzahl einzigartiger Typen?*\n",
"\n",
"### Merkmal 2 - Zeitbezüge innerhalb der Vorgänge\n",
"\n",
"- *Identifikation von Objekten, die häufig vertreten sind*\n",
"- *Untersuchung der Zeitabstände zwischen Erstellung, Planung, Erledigung:*\n",
" - Erstellung: ``ErstellungsDatum``\n",
" - Planung: ``VorgangsDatum``\n",
" - Erledigung: ``ErledigungsDatum``\n",
"- *Abstände zwischen zwei ähnlichen Fehlerbildern jedes Objekts oder den Objekte, die am häufigsten vertreten sind*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# Merkmal 1: Clustering von Vorgangsbeschreibungen\n",
"\n",
"## Recherche\n",
"[Textmining HS Hannover](https://textmining.wp.hs-hannover.de/Preprocessing.html)\n",
"\n",
"### Allgemeine Zergliederung der Einzelbeschreibungen\n",
"\n",
"- Text in Sätze\n",
"- Sätze in Wörter\n",
"- Wörter in Grundform:\n",
" - Lemma: Die Form des Wortes, wie sie in einem Wörterbuch steht. Z.B.: Haus, laufen, begründen\n",
" - Stamm: Das Wort ohne Flexionsendungen (Prefixe und Suffixe). Z.B.: Haus, lauf, begründ\n",
" - Wurzel: Kern des Wortes, von dem das Wort ggf. durch Derivation abgeleitet wurde. Z.B.: Haus, lauf, Grund\n",
"- Wortartbestimmung\n",
" - klassische Part-of-Speech-Erkennung (herkömmliche Wortart)\n",
" - Named Entity Recognition (NER) (Eigennamen)\n",
" - Bsp. spaCy: Person, Ort, Organisation, Verschiedenes\n",
"\n",
"#### Semantik\n",
"\n",
"- Wörter innerhalb eines Satzes größere Zusammenhänge als außerhalb\n",
"\n",
"### Pakete\n",
"\n",
"- Englisch: \n",
" - [NLTK](https://www.nltk.org/)\n",
"- Deutsch:\n",
" - [HanTa - The Hanover Tagger](https://github.com/wartaal/HanTa/tree/master)\n",
" - [TreeTagger](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)\n",
" - [Python Wrapper](https://treetaggerwrapper.readthedocs.io/en/latest/)\n",
" - [spaCy](https://spacy.io/)\n",
" - [Beispiel 1](https://www.trinnovative.de/blog/2020-09-08-natural-language-processing-mit-spacy.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyse"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import spacy\n",
"from collections import Counter\n",
"from itertools import combinations\n",
"from dateutil.parser import parse\n",
"import re\n",
"from spellchecker import SpellChecker\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"import logging\n",
"import sys\n",
"import pickle\n",
"\n",
"LOGGING_LEVEL = 'INFO'\n",
"logging.basicConfig(level=LOGGING_LEVEL, stream=sys.stdout)\n",
"logger = logging.getLogger('base')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def save_pickle(obj, path):\n",
" with open(path, 'wb') as file:\n",
" pickle.dump(obj, file, protocol=pickle.HIGHEST_PROTOCOL)\n",
" \n",
"def load_pickle(path):\n",
" with open(path, 'rb') as file:\n",
" obj = pickle.load(file)\n",
" return obj"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"sns.set()\n",
"LOAD_CALC_FILES = False\n",
"\n",
"DESC_BLACKLIST = set(['-'])\n",
"\"\"\"\n",
"GENERAL_BLACKLIST = set([\n",
" 'herr', 'hr.', 'förster', 'graf', 'stöppel', \n",
" 'stab', 'kw', 'h.', 'koch', 'heininger', '.',\n",
" 'schwab', 'm.', 'wenninger', '-', '--',\n",
"])\n",
"\"\"\"\n",
"\n",
"GENERAL_BLACKLIST = set([\n",
" 'herr', 'hr.' 'kw', 'h.', '.',\n",
" 'm.', '-', '--', 'dr.', 'dr',\n",
"])\n",
"\n",
"#GENERAL_BLACKLIST = set()\n",
"#POS_of_interest = set(['NOUN', 'PROPN', 'ADJ', 'VERB', 'AUX'])\n",
"POS_of_interest = set(['NOUN', 'ADJ', 'VERB', 'AUX'])\n",
"TAG_of_interest = set(['ADJD'])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# load language model\n",
"nlp = spacy.load('de_dep_news_trf')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 129020 entries, 0 to 129019\n",
"Data columns (total 20 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 VorgangsID 129020 non-null int64 \n",
" 1 ObjektID 129020 non-null int64 \n",
" 2 HObjektText 129003 non-null object \n",
" 3 ObjektArtID 129020 non-null int64 \n",
" 4 ObjektArtText 128372 non-null object \n",
" 5 VorgangsTypID 129020 non-null int64 \n",
" 6 VorgangsTypName 129020 non-null object \n",
" 7 VorgangsDatum 129020 non-null datetime64[ns]\n",
" 8 VorgangsStatusId 129020 non-null int64 \n",
" 9 VorgangsPrioritaet 129020 non-null int64 \n",
" 10 VorgangsBeschreibung 124087 non-null object \n",
" 11 VorgangsOrt 507 non-null object \n",
" 12 VorgangsArtText 129020 non-null object \n",
" 13 ErledigungsDatum 129020 non-null datetime64[ns]\n",
" 14 ErledigungsArtText 128474 non-null object \n",
" 15 ErledigungsBeschreibung 118135 non-null object \n",
" 16 MPMelderArbeitsplatz 6359 non-null object \n",
" 17 MPAbteilungBezeichnung 6359 non-null object \n",
" 18 Arbeitsbeginn 123538 non-null datetime64[ns]\n",
" 19 ErstellungsDatum 129020 non-null datetime64[ns]\n",
"dtypes: datetime64[ns](4), int64(6), object(10)\n",
"memory usage: 19.7+ MB\n"
]
}
],
"source": [
"# load dataset\n",
"FILE_PATH = '01_2_Rohdaten_neu/Export4.csv'\n",
"date_cols = ['VorgangsDatum', 'ErledigungsDatum', 'Arbeitsbeginn', 'ErstellungsDatum']\n",
"raw = pd.read_csv(filepath_or_buffer=FILE_PATH, sep=';', encoding='cp1252', parse_dates=date_cols, dayfirst=True)\n",
"raw.info()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11</td>\n",
" <td>114</td>\n",
" <td>427 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-06</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Kettbaum kaputt</td>\n",
" <td>2019-03-06</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>17</td>\n",
" <td>124</td>\n",
" <td>621 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-11</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>asgasdg</td>\n",
" <td>2019-03-11</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Elektrowerkstatt</td>\n",
" <td>Elektrowerkstatt</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>53</td>\n",
" <td>244</td>\n",
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
" <td>5</td>\n",
" <td>Greifer-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-19</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Kupplung schleift</td>\n",
" <td>NaN</td>\n",
" <td>Kupplung defekt</td>\n",
" <td>2019-03-20</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>NaN</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>58</td>\n",
" <td>257</td>\n",
" <td>107, Webmaschine, OM 220 EOS</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-21</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Gegengewicht wieder anbringen</td>\n",
" <td>NaN</td>\n",
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
" <td>2019-03-21</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2019-03-21</td>\n",
" <td>2019-03-21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>81</td>\n",
" <td>138</td>\n",
" <td>00138, Schärmaschine 9,</td>\n",
" <td>16</td>\n",
" <td>Schärmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
" <td>NaN</td>\n",
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText \\\n",
"0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n",
"1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n",
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
"3 58 257 107, Webmaschine, OM 220 EOS \n",
"4 81 138 00138, Schärmaschine 9, \n",
"\n",
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
"3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"0 2019-03-06 4 0 \n",
"1 2019-03-11 5 0 \n",
"2 2019-03-19 5 0 \n",
"3 2019-03-21 5 0 \n",
"4 2019-03-25 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 Kupplung schleift NaN \n",
"3 Gegengewicht wieder anbringen NaN \n",
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"0 Kettbaum kaputt 2019-03-06 \n",
"1 asgasdg 2019-03-11 \n",
"2 Kupplung defekt 2019-03-20 \n",
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
"\n",
" ErledigungsArtText ErledigungsBeschreibung \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 Reparatur UTT NaN \n",
"3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
"\n",
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"0 Weberei Weberei NaT 2019-03-06 \n",
"1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n",
"2 Weberei Weberei NaT 2019-03-19 \n",
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Features: 20\n"
]
}
],
"source": [
"print(f\"Anzahl Features: {len(raw.columns)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Neue Features gegenüber letzter Analyse:**\n",
"- ``ObjektArtID``\n",
"- ``ObjektArtText``\n",
"- ``VorgangsTypName``"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Duplikate"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"duplicates_filt = raw.duplicated()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Duplikate: 84\n"
]
}
],
"source": [
"print(f\"Anzahl Duplikate: {duplicates_filt.sum()}\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"filt_data = raw[duplicates_filt]\n",
"uni_obj_id_dupl = filt_data['ObjektID'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl einzigartiger Objekt-IDs unter Duplikaten: 47\n"
]
}
],
"source": [
"print(f\"Anzahl einzigartiger Objekt-IDs unter Duplikaten: {len(uni_obj_id_dupl)}\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 128936 entries, 0 to 128935\n",
"Data columns (total 20 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 VorgangsID 128936 non-null int64 \n",
" 1 ObjektID 128936 non-null int64 \n",
" 2 HObjektText 128920 non-null object \n",
" 3 ObjektArtID 128936 non-null int64 \n",
" 4 ObjektArtText 128289 non-null object \n",
" 5 VorgangsTypID 128936 non-null int64 \n",
" 6 VorgangsTypName 128936 non-null object \n",
" 7 VorgangsDatum 128936 non-null datetime64[ns]\n",
" 8 VorgangsStatusId 128936 non-null int64 \n",
" 9 VorgangsPrioritaet 128936 non-null int64 \n",
" 10 VorgangsBeschreibung 124008 non-null object \n",
" 11 VorgangsOrt 507 non-null object \n",
" 12 VorgangsArtText 128936 non-null object \n",
" 13 ErledigungsDatum 128936 non-null datetime64[ns]\n",
" 14 ErledigungsArtText 128402 non-null object \n",
" 15 ErledigungsBeschreibung 118086 non-null object \n",
" 16 MPMelderArbeitsplatz 6337 non-null object \n",
" 17 MPAbteilungBezeichnung 6337 non-null object \n",
" 18 Arbeitsbeginn 123480 non-null datetime64[ns]\n",
" 19 ErstellungsDatum 128936 non-null datetime64[ns]\n",
"dtypes: datetime64[ns](4), int64(6), object(10)\n",
"memory usage: 19.7+ MB\n"
]
}
],
"source": [
"wo_duplicates = raw.drop_duplicates(ignore_index=True)\n",
"wo_duplicates.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ``VorgangsBeschreibung``"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### **NA vals und Duplikate**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"String-Bereinigung"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"SPECIAL_CHARS = set(['&', '$', '%', '§', '/', '(', ')', '_', \n",
" '+', '', '--', '<', '>', '´',\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def clean_string(string: str) -> str:\n",
" #num_reps = 5\n",
" \n",
" # remove special chars\n",
" pattern = r'[\\t\\n\\r\\f\\v]'\n",
" string = re.sub(pattern, ' ', string)\n",
" # remove dates\n",
" pattern = r'[\\d]{1,4}[.:][\\d]{1,4}[.:][\\d]{1,4}'\n",
" string = re.sub(pattern, '', string)\n",
" # remove times\n",
" pattern = r'[\\d]{1,2}[:][\\d]{1,2}[:][\\d]{0,2}'\n",
" string = re.sub(pattern, '', string)\n",
" # remove all chars despite punctuation and alphanumeric ones\n",
" pattern = r'[^ \\w.,;:\\-äöüÄÖÜ]+'\n",
" string = re.sub(pattern, '', string)\n",
" # remove - where it is used as em dash\n",
" pattern = r'[\\W]+-[\\W]+'\n",
" string = re.sub(pattern, ' ', string)\n",
" # remove whitespaces in front of punctuation\n",
" pattern = r'[ ]+([;,.:])'\n",
" string = re.sub(pattern, r'\\1', string)\n",
" # remove multiple whitespaces\n",
" pattern = r'[ ]+'\n",
" string = re.sub(pattern, ' ', string)\n",
" # remove whitespaces at the beginning and the end\n",
" string = string.strip()\n",
" \n",
" #while num_reps != 0:\n",
" #string = string.replace('\\n', ' ')\n",
" #string = string.replace('\\t', ' ')\n",
" #string = string.replace(' ', ' ')\n",
" #string = string.replace(' ', ' ')\n",
" #string = string.replace(' - ', ' ')\n",
" \"\"\"\n",
" for char in SPECIAL_CHARS:\n",
" string = string.replace(char, '')\n",
" \n",
" #num_reps -= 1\n",
" \n",
" # remove spaces at the beginning and the end\n",
" string = string.strip()\n",
" \"\"\"\n",
" \n",
" return string"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"base = wo_duplicates.copy()\n",
"base = base.dropna(axis=0, subset='VorgangsBeschreibung')\n",
"base['VorgangsBeschreibung'] = base['VorgangsBeschreibung'].map(clean_string)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>53</td>\n",
" <td>244</td>\n",
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
" <td>5</td>\n",
" <td>Greifer-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-19</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Kupplung schleift</td>\n",
" <td>NaN</td>\n",
" <td>Kupplung defekt</td>\n",
" <td>2019-03-20</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>NaN</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>58</td>\n",
" <td>257</td>\n",
" <td>107, Webmaschine, OM 220 EOS</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-21</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Gegengewicht wieder anbringen</td>\n",
" <td>NaN</td>\n",
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
" <td>2019-03-21</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2019-03-21</td>\n",
" <td>2019-03-21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>81</td>\n",
" <td>138</td>\n",
" <td>00138, Schärmaschine 9,</td>\n",
" <td>16</td>\n",
" <td>Schärmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>da ist etwas gebrochen. Herr Heininger</td>\n",
" <td>NaN</td>\n",
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>82</td>\n",
" <td>0</td>\n",
" <td>Warenschau allgemein</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Klappbügel Portalkran H31 defekt</td>\n",
" <td>Warenschau allgemein</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Feder ausgetauscht</td>\n",
" <td>Warenschau</td>\n",
" <td>Warenschau</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>76</td>\n",
" <td>0</td>\n",
" <td>Neben der Türe</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-22</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Schraube nix mer gut</td>\n",
" <td>Neben der Türe</td>\n",
" <td>Kettbaum</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\t</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128931</th>\n",
" <td>518956</td>\n",
" <td>1708</td>\n",
" <td>01708, Betriebsfahrräder Schlosserei,</td>\n",
" <td>57</td>\n",
" <td>Interne Wartungsobjekte</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2023-06-19</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>2-wöchige Reinigung Sichtkontrolle Technische ...</td>\n",
" <td>NaN</td>\n",
" <td>02 Interne Reinigung / Pflege / Überprüfung</td>\n",
" <td>2023-06-19</td>\n",
" <td>Intern UTT - Prüfung</td>\n",
" <td>Reinigung &amp; Sichtkontrolle (Technische Einric...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2023-06-19</td>\n",
" <td>2023-03-14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128932</th>\n",
" <td>275123</td>\n",
" <td>1654</td>\n",
" <td>WEBEREI ALLGEMEIN, Weberei allgemein,</td>\n",
" <td>90</td>\n",
" <td>UTT allgemein</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2022-09-29</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Adapter entfernen und Gewinde nachschneiden.</td>\n",
" <td>NaN</td>\n",
" <td>Kettbaum-Adapter</td>\n",
" <td>2022-09-30</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>mit schlosserei aufräumen</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2022-09-30</td>\n",
" <td>2022-09-29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128933</th>\n",
" <td>275125</td>\n",
" <td>1795</td>\n",
" <td>A054.S, Jacquardmaschine,</td>\n",
" <td>24</td>\n",
" <td>Stäubli-Jacquardmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2022-09-30</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Alle 4 Schrauben und teile der Kettbaumlagerun...</td>\n",
" <td>NaN</td>\n",
" <td>Kettbaum</td>\n",
" <td>2022-09-30</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Neues Teil eingebaut und altes repariert</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2022-09-30</td>\n",
" <td>2022-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128934</th>\n",
" <td>275188</td>\n",
" <td>1</td>\n",
" <td>00001, Ausrüstungsanlage 1,</td>\n",
" <td>1</td>\n",
" <td>Waschmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2022-09-30</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Walzenlager WK 6 überprüfenauswechseln</td>\n",
" <td>NaN</td>\n",
" <td>Lagereinheit (Wälzlager, Kugellager, etc.)</td>\n",
" <td>2022-10-04</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Lager getauscht</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2022-10-04</td>\n",
" <td>2022-09-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128935</th>\n",
" <td>275219</td>\n",
" <td>326</td>\n",
" <td>B38, Niederhubwagen,</td>\n",
" <td>32</td>\n",
" <td>Flurförderzeuge / Putzmaschine / Rasenmäher</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2022-10-03</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
" <td>NaN</td>\n",
" <td>Flurförderzeug</td>\n",
" <td>2022-10-05</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Neue Gasfeder eingebaut</td>\n",
" <td>Warenschau</td>\n",
" <td>Warenschau</td>\n",
" <td>2022-10-04</td>\n",
" <td>2022-10-03</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>124008 rows × 20 columns</p>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText \\\n",
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
"3 58 257 107, Webmaschine, OM 220 EOS \n",
"4 81 138 00138, Schärmaschine 9, \n",
"5 82 0 Warenschau allgemein \n",
"6 76 0 Neben der Türe \n",
"... ... ... ... \n",
"128931 518956 1708 01708, Betriebsfahrräder Schlosserei, \n",
"128932 275123 1654 WEBEREI ALLGEMEIN, Weberei allgemein, \n",
"128933 275125 1795 A054.S, Jacquardmaschine, \n",
"128934 275188 1 00001, Ausrüstungsanlage 1, \n",
"128935 275219 326 B38, Niederhubwagen, \n",
"\n",
" ObjektArtID ObjektArtText \\\n",
"2 5 Greifer-Webmaschine \n",
"3 3 Luft-Webmaschine \n",
"4 16 Schärmaschine \n",
"5 0 NaN \n",
"6 0 NaN \n",
"... ... ... \n",
"128931 57 Interne Wartungsobjekte \n",
"128932 90 UTT allgemein \n",
"128933 24 Stäubli-Jacquardmaschine \n",
"128934 1 Waschmaschine \n",
"128935 32 Flurförderzeuge / Putzmaschine / Rasenmäher \n",
"\n",
" VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
"2 3 Reparaturauftrag (Portal) 2019-03-19 \n",
"3 3 Reparaturauftrag (Portal) 2019-03-21 \n",
"4 3 Reparaturauftrag (Portal) 2019-03-25 \n",
"5 3 Reparaturauftrag (Portal) 2019-03-25 \n",
"6 3 Reparaturauftrag (Portal) 2019-03-22 \n",
"... ... ... ... \n",
"128931 1 Wartung 2023-06-19 \n",
"128932 3 Reparaturauftrag (Portal) 2022-09-29 \n",
"128933 3 Reparaturauftrag (Portal) 2022-09-30 \n",
"128934 3 Reparaturauftrag (Portal) 2022-09-30 \n",
"128935 3 Reparaturauftrag (Portal) 2022-10-03 \n",
"\n",
" VorgangsStatusId VorgangsPrioritaet \\\n",
"2 5 0 \n",
"3 5 0 \n",
"4 5 0 \n",
"5 5 0 \n",
"6 5 0 \n",
"... ... ... \n",
"128931 5 0 \n",
"128932 5 0 \n",
"128933 5 0 \n",
"128934 5 1 \n",
"128935 5 0 \n",
"\n",
" VorgangsBeschreibung \\\n",
"2 Kupplung schleift \n",
"3 Gegengewicht wieder anbringen \n",
"4 da ist etwas gebrochen. Herr Heininger \n",
"5 Klappbügel Portalkran H31 defekt \n",
"6 Schraube nix mer gut \n",
"... ... \n",
"128931 2-wöchige Reinigung Sichtkontrolle Technische ... \n",
"128932 Adapter entfernen und Gewinde nachschneiden. \n",
"128933 Alle 4 Schrauben und teile der Kettbaumlagerun... \n",
"128934 Walzenlager WK 6 überprüfenauswechseln \n",
"128935 Befestigung Deckel für Batteriefach defekt Hal... \n",
"\n",
" VorgangsOrt \\\n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
"5 Warenschau allgemein \n",
"6 Neben der Türe \n",
"... ... \n",
"128931 NaN \n",
"128932 NaN \n",
"128933 NaN \n",
"128934 NaN \n",
"128935 NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"2 Kupplung defekt 2019-03-20 \n",
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
"5 Allgemeine Reparaturarbeiten 2019-03-25 \n",
"6 Kettbaum 2019-03-25 \n",
"... ... ... \n",
"128931 02 Interne Reinigung / Pflege / Überprüfung 2023-06-19 \n",
"128932 Kettbaum-Adapter 2022-09-30 \n",
"128933 Kettbaum 2022-09-30 \n",
"128934 Lagereinheit (Wälzlager, Kugellager, etc.) 2022-10-04 \n",
"128935 Flurförderzeug 2022-10-05 \n",
"\n",
" ErledigungsArtText \\\n",
"2 Reparatur UTT \n",
"3 Reparatur UTT \n",
"4 Reparatur UTT \n",
"5 Reparatur UTT \n",
"6 Reparatur UTT \n",
"... ... \n",
"128931 Intern UTT - Prüfung \n",
"128932 Intern UTT - Reparatur \n",
"128933 Intern UTT - Reparatur \n",
"128934 Intern UTT - Reparatur \n",
"128935 Intern UTT - Reparatur \n",
"\n",
" ErledigungsBeschreibung \\\n",
"2 NaN \n",
"3 Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
"4 Bolzen gebrochen. Bolzen neu angefertig und di... \n",
"5 Feder ausgetauscht \n",
"6 Schrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\t \n",
"... ... \n",
"128931 Reinigung & Sichtkontrolle (Technische Einric... \n",
"128932 mit schlosserei aufräumen \n",
"128933 Neues Teil eingebaut und altes repariert \n",
"128934 Lager getauscht \n",
"128935 Neue Gasfeder eingebaut \n",
"\n",
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn \\\n",
"2 Weberei Weberei NaT \n",
"3 Weberei Weberei 2019-03-21 \n",
"4 Vorwerk Vorwerk 2019-03-25 \n",
"5 Warenschau Warenschau 2019-03-25 \n",
"6 Vorwerk Vorwerk 2019-03-25 \n",
"... ... ... ... \n",
"128931 NaN NaN 2023-06-19 \n",
"128932 Weberei Weberei 2022-09-30 \n",
"128933 Weberei Weberei 2022-09-30 \n",
"128934 Ausrüstung Ausrüstung 2022-10-04 \n",
"128935 Warenschau Warenschau 2022-10-04 \n",
"\n",
" ErstellungsDatum \n",
"2 2019-03-19 \n",
"3 2019-03-21 \n",
"4 2019-03-25 \n",
"5 2019-03-25 \n",
"6 2019-03-22 \n",
"... ... \n",
"128931 2023-03-14 \n",
"128932 2022-09-29 \n",
"128933 2022-09-30 \n",
"128934 2022-09-30 \n",
"128935 2022-10-03 \n",
"\n",
"[124008 rows x 20 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Einträge: 124008\n"
]
}
],
"source": [
"descriptions = base['VorgangsBeschreibung']\n",
"print(f\"Einträge: {len(descriptions)}\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Duplikate Vorgangsbeschreibungen: 117255\n",
"Anzahl einzigartiger Vorgangsbeschreibungen: 6753\n",
"Anteil einzigartiger Vorgangsbeschreibungen: 5.45 %\n"
]
}
],
"source": [
"num_dupl_descr = descriptions.duplicated().sum()\n",
"uni_descr = descriptions.unique()\n",
"num_uni_descr = len(uni_descr)\n",
"\n",
"print(f\"Anzahl Duplikate Vorgangsbeschreibungen: {num_dupl_descr}\")\n",
"print(f\"Anzahl einzigartiger Vorgangsbeschreibungen: {num_uni_descr}\")\n",
"print(f\"Anteil einzigartiger Vorgangsbeschreibungen: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"if not LOAD_CALC_FILES:\n",
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
" descr_df = pd.DataFrame(columns=cols)\n",
" max_val = 0\n",
" text = None\n",
" index = 0\n",
"\n",
"\n",
" for idx, description in enumerate(uni_descr):\n",
" len_descr = len(description)\n",
" filt = base['VorgangsBeschreibung'] == description\n",
" temp = base[filt]\n",
" assoc_obj_ids = temp['ObjektID'].unique()\n",
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
" num_dupl = filt.sum()\n",
" \n",
" conc_df = pd.DataFrame(data=[[\n",
" description,\n",
" len_descr,\n",
" num_dupl,\n",
" assoc_obj_ids,\n",
" num_assoc_obj_ids\n",
" ]], columns=cols)\n",
" \n",
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
" \n",
" if num_dupl > max_val:\n",
" max_val = num_dupl\n",
" index = idx\n",
" text = description\n",
" \n",
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>66</td>\n",
" <td>92592</td>\n",
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
" <td>37</td>\n",
" <td>1654</td>\n",
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
" <td>37</td>\n",
" <td>1616</td>\n",
" <td>[0, 970, 2134, 2137]</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
" <td>37</td>\n",
" <td>1265</td>\n",
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
" <td>44</td>\n",
" <td>687</td>\n",
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
" <td>166</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2665</th>\n",
" <td>Überprüfung der Y-Achse Schneidbrücke am LC 2 ...</td>\n",
" <td>176</td>\n",
" <td>1</td>\n",
" <td>[20]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2664</th>\n",
" <td>Luftschlauch muss ausgetauscht werden. Ist und...</td>\n",
" <td>195</td>\n",
" <td>1</td>\n",
" <td>[1]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2663</th>\n",
" <td>Riemenscheibe tauschen auf 650 UPM</td>\n",
" <td>34</td>\n",
" <td>1</td>\n",
" <td>[74]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2660</th>\n",
" <td>Durchführung: Sollwert: 20 0,1g</td>\n",
" <td>31</td>\n",
" <td>1</td>\n",
" <td>[1746]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6752</th>\n",
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
" <td>99</td>\n",
" <td>1</td>\n",
" <td>[326]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6753 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
"33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n",
"130 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
"159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
"139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
"... ... ... ... \n",
"2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 176 1 \n",
"2664 Luftschlauch muss ausgetauscht werden. Ist und... 195 1 \n",
"2663 Riemenscheibe tauschen auf 650 UPM 34 1 \n",
"2660 Durchführung: Sollwert: 20 0,1g 31 1 \n",
"6752 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
"130 [0, 970, 2134, 2137] 4 \n",
"159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
"139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
"... ... ... \n",
"2665 [20] 1 \n",
"2664 [1] 1 \n",
"2663 [74] 1 \n",
"2660 [1746] 1 \n",
"6752 [326] 1 \n",
"\n",
"[6753 rows x 5 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# save/load dataframe\n",
"FILE_PATH = 'VorgangsBeschreibung_analyse_1.fth'\n",
"if LOAD_CALC_FILES:\n",
" temp1 = pd.read_feather(FILE_PATH)\n",
" temp1 = temp1.set_index('index')\n",
"else:\n",
" save_df = temp1.reset_index()\n",
" save_df.to_feather(FILE_PATH)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"filt = temp1['descr'].str.contains('3-monatlich')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>476</th>\n",
" <td>3-monatliche Sichtkontrolle Reinigung</td>\n",
" <td>37</td>\n",
" <td>222</td>\n",
" <td>[883, 1196, 1197, 1198, 1199, 1201, 1202, 1203...</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>671</th>\n",
" <td>3-monatliche Kontrolle</td>\n",
" <td>22</td>\n",
" <td>20</td>\n",
" <td>[2021, 2045]</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>303</th>\n",
" <td>3-monatliche Überprüfung durch Firma Siemens</td>\n",
" <td>44</td>\n",
" <td>16</td>\n",
" <td>[2029]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1055</th>\n",
" <td>3-monatliche Kontrolle der Wasserfilter, bei B...</td>\n",
" <td>186</td>\n",
" <td>16</td>\n",
" <td>[1175, 1176]</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>914</th>\n",
" <td>3-monatliche Überprüfung der Telefonanlage</td>\n",
" <td>42</td>\n",
" <td>14</td>\n",
" <td>[2035]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>281</th>\n",
" <td>3-monatliche Überprüfung der Torsprechanlage</td>\n",
" <td>44</td>\n",
" <td>14</td>\n",
" <td>[2037]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>280</th>\n",
" <td>3-monatliche Überprüfung der Sicherheitslichts...</td>\n",
" <td>111</td>\n",
" <td>14</td>\n",
" <td>[2046]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>658</th>\n",
" <td>3-monatliche Sichtkontrolle der Not- Sicherhei...</td>\n",
" <td>76</td>\n",
" <td>13</td>\n",
" <td>[2042]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>279</th>\n",
" <td>3-monatliche Überprüfung der Regalsicherungsan...</td>\n",
" <td>84</td>\n",
" <td>13</td>\n",
" <td>[2047]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3234</th>\n",
" <td>3-monatliche Überprüfung der Personen-Überwach...</td>\n",
" <td>72</td>\n",
" <td>11</td>\n",
" <td>[2040]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>608</th>\n",
" <td>3-monatliche Kontrolle der optischen Alarmgebe...</td>\n",
" <td>61</td>\n",
" <td>10</td>\n",
" <td>[2041]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>137</th>\n",
" <td>3-monatliche Überprüfung des Abwassers durch P...</td>\n",
" <td>70</td>\n",
" <td>8</td>\n",
" <td>[958]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2704</th>\n",
" <td>3-monatliche Reinigung</td>\n",
" <td>22</td>\n",
" <td>8</td>\n",
" <td>[903, 905]</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>995</th>\n",
" <td>3-monatliche Kontrolle der Erste-Hilfe-Kästen ...</td>\n",
" <td>103</td>\n",
" <td>7</td>\n",
" <td>[2456]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3207</th>\n",
" <td>3-monatliche Sichtkontrolle der Mittelspannung...</td>\n",
" <td>89</td>\n",
" <td>5</td>\n",
" <td>[2026]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3551</th>\n",
" <td>3-monatliche Überprüfung der Uhrenanlagen Betr...</td>\n",
" <td>84</td>\n",
" <td>4</td>\n",
" <td>[2034]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2939</th>\n",
" <td>3-monatliche Sichtkontrolle der Starkstrom-Anl...</td>\n",
" <td>75</td>\n",
" <td>4</td>\n",
" <td>[2021]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1001</th>\n",
" <td>3-monatliche Kontrolle des Seils eventueller A...</td>\n",
" <td>54</td>\n",
" <td>2</td>\n",
" <td>[838]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4054</th>\n",
" <td>3-monatliche Überprüfung des Abwassers durch P...</td>\n",
" <td>103</td>\n",
" <td>2</td>\n",
" <td>[958]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3587</th>\n",
" <td>3-monatliche Sichtkontrolle der optischen Alar...</td>\n",
" <td>66</td>\n",
" <td>2</td>\n",
" <td>[2041]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4284</th>\n",
" <td>3-monatliche Überprüfung des Abwassers durch P...</td>\n",
" <td>142</td>\n",
" <td>1</td>\n",
" <td>[958]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6157</th>\n",
" <td>3-monatliche Überprüfung des Abwassers durch P...</td>\n",
" <td>125</td>\n",
" <td>1</td>\n",
" <td>[958]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5333</th>\n",
" <td>3-monatliche Kontrolle des Seils eventueller A...</td>\n",
" <td>995</td>\n",
" <td>1</td>\n",
" <td>[838]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5743</th>\n",
" <td>3-monatliche Überprüfung durch Firma Siemens. ...</td>\n",
" <td>110</td>\n",
" <td>1</td>\n",
" <td>[2029]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"476 3-monatliche Sichtkontrolle Reinigung 37 222 \n",
"671 3-monatliche Kontrolle 22 20 \n",
"303 3-monatliche Überprüfung durch Firma Siemens 44 16 \n",
"1055 3-monatliche Kontrolle der Wasserfilter, bei B... 186 16 \n",
"914 3-monatliche Überprüfung der Telefonanlage 42 14 \n",
"281 3-monatliche Überprüfung der Torsprechanlage 44 14 \n",
"280 3-monatliche Überprüfung der Sicherheitslichts... 111 14 \n",
"658 3-monatliche Sichtkontrolle der Not- Sicherhei... 76 13 \n",
"279 3-monatliche Überprüfung der Regalsicherungsan... 84 13 \n",
"3234 3-monatliche Überprüfung der Personen-Überwach... 72 11 \n",
"608 3-monatliche Kontrolle der optischen Alarmgebe... 61 10 \n",
"137 3-monatliche Überprüfung des Abwassers durch P... 70 8 \n",
"2704 3-monatliche Reinigung 22 8 \n",
"995 3-monatliche Kontrolle der Erste-Hilfe-Kästen ... 103 7 \n",
"3207 3-monatliche Sichtkontrolle der Mittelspannung... 89 5 \n",
"3551 3-monatliche Überprüfung der Uhrenanlagen Betr... 84 4 \n",
"2939 3-monatliche Sichtkontrolle der Starkstrom-Anl... 75 4 \n",
"1001 3-monatliche Kontrolle des Seils eventueller A... 54 2 \n",
"4054 3-monatliche Überprüfung des Abwassers durch P... 103 2 \n",
"3587 3-monatliche Sichtkontrolle der optischen Alar... 66 2 \n",
"4284 3-monatliche Überprüfung des Abwassers durch P... 142 1 \n",
"6157 3-monatliche Überprüfung des Abwassers durch P... 125 1 \n",
"5333 3-monatliche Kontrolle des Seils eventueller A... 995 1 \n",
"5743 3-monatliche Überprüfung durch Firma Siemens. ... 110 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"476 [883, 1196, 1197, 1198, 1199, 1201, 1202, 1203... 18 \n",
"671 [2021, 2045] 2 \n",
"303 [2029] 1 \n",
"1055 [1175, 1176] 2 \n",
"914 [2035] 1 \n",
"281 [2037] 1 \n",
"280 [2046] 1 \n",
"658 [2042] 1 \n",
"279 [2047] 1 \n",
"3234 [2040] 1 \n",
"608 [2041] 1 \n",
"137 [958] 1 \n",
"2704 [903, 905] 2 \n",
"995 [2456] 1 \n",
"3207 [2026] 1 \n",
"3551 [2034] 1 \n",
"2939 [2021] 1 \n",
"1001 [838] 1 \n",
"4054 [958] 1 \n",
"3587 [2041] 1 \n",
"4284 [958] 1 \n",
"6157 [958] 1 \n",
"5333 [838] 1 \n",
"5743 [2029] 1 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test2 = temp1.loc[filt,:]\n",
"test2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"def pre_clean_spell_check(string: str) -> str:\n",
" \n",
" for char in SPELL_CHECK_NON_CHARS:\n",
" string = string.replace(char, ' ')\n",
" \n",
" # remove spaces at the beginning and the end\n",
" string = string.strip()\n",
" \n",
" return string\n",
"\n",
"\n",
"test = temp1['descr'].map(pre_clean_spell_check)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"objs = temp1.loc[140, 'assoc_obj_ids']"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([7], dtype=int64)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"objs"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>166</th>\n",
" <td>111649</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2021-03-17</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>-Bereich Aufwicklung, Bogenwalze Madenschraube...</td>\n",
" <td>NaN</td>\n",
" <td>Walze (mechanischer Defekt)</td>\n",
" <td>2021-03-17</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Madenschrauben angezogen</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2021-03-17</td>\n",
" <td>2021-03-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>222</th>\n",
" <td>133856</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2021-07-27</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>2 Keilriemen gerissen- Bereich Abluftventilato...</td>\n",
" <td>NaN</td>\n",
" <td>Antriebsriemen (Keilriemen / Zahnriemen / Flac...</td>\n",
" <td>2021-07-27</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>die Keilriemen SPA 1282 aus Neu Ulm geholt und...</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2021-07-27</td>\n",
" <td>2021-07-27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>240</th>\n",
" <td>140704</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2021-09-29</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Wir benötigen einen weiteren Ersatzteilschrank...</td>\n",
" <td>NaN</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2021-10-04</td>\n",
" <td>Intern UTT - Montage</td>\n",
" <td>Wurde montiert</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2021-10-04</td>\n",
" <td>2021-09-29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>437</th>\n",
" <td>123811</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2021-05-06</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>bitte dringend 10l Eimer zum Silikon versenden...</td>\n",
" <td>NaN</td>\n",
" <td>Maschineninfrastruktur</td>\n",
" <td>2021-05-06</td>\n",
" <td>Intern UTT - Wartung</td>\n",
" <td>NaN</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2021-05-06</td>\n",
" <td>2021-05-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>439</th>\n",
" <td>107885</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2021-07-01</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Monatliche Kontrolle des Flusen-Absaugrohrs</td>\n",
" <td>NaN</td>\n",
" <td>Maschinen-Wartung monatlich</td>\n",
" <td>2021-06-28</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaT</td>\n",
" <td>2021-03-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128396</th>\n",
" <td>531424</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2023-05-08</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>KKT Chiller Auslauf Störung. Füllstand Min. STOP</td>\n",
" <td>NaN</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2023-05-08</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Kühlflüssigkeit aufgefüllt und Filter gewechse...</td>\n",
" <td>Ausrüstung 2</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2023-05-08</td>\n",
" <td>2023-05-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128446</th>\n",
" <td>530613</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2023-05-30</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Monatliche Überprüfung der Gasleitung mit dem ...</td>\n",
" <td>NaN</td>\n",
" <td>01 Interne Reinigung / Pflege / Überprüfung</td>\n",
" <td>2023-06-05</td>\n",
" <td>Intern UTT - Prüfung</td>\n",
" <td>Dichtheitsprüfung der Gasleitungen</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2023-06-05</td>\n",
" <td>2023-04-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128563</th>\n",
" <td>580234</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2023-05-30</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Mischer für Beschichtungsanlage bitte ausbrenn...</td>\n",
" <td>NaN</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2023-05-30</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>erledigt</td>\n",
" <td>Ausrüstung 2, Kombianlage</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2023-05-30</td>\n",
" <td>2023-05-30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128636</th>\n",
" <td>586208</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2023-06-12</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Haken im Kran Auslauf defekt</td>\n",
" <td>NaN</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2023-06-12</td>\n",
" <td>Intern UTT - Reparatur</td>\n",
" <td>Haken getauscht</td>\n",
" <td>Ausrüstung</td>\n",
" <td>Ausrüstung</td>\n",
" <td>2023-06-12</td>\n",
" <td>2023-06-12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128915</th>\n",
" <td>261786</td>\n",
" <td>7</td>\n",
" <td>00007, Ausrüstung 2,</td>\n",
" <td>2</td>\n",
" <td>Beschichtungsmaschinen</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2023-05-30</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Kontrolle der Risiko-Ersatzteile</td>\n",
" <td>NaN</td>\n",
" <td>Überprüfung Risikoersatzteile</td>\n",
" <td>2023-05-30</td>\n",
" <td>Intern UTT - Dokumentenkontrolle</td>\n",
" <td>erledigt.\\n</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2023-05-30</td>\n",
" <td>2022-06-30</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>272 rows × 20 columns</p>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
"166 111649 7 00007, Ausrüstung 2, 2 \n",
"222 133856 7 00007, Ausrüstung 2, 2 \n",
"240 140704 7 00007, Ausrüstung 2, 2 \n",
"437 123811 7 00007, Ausrüstung 2, 2 \n",
"439 107885 7 00007, Ausrüstung 2, 2 \n",
"... ... ... ... ... \n",
"128396 531424 7 00007, Ausrüstung 2, 2 \n",
"128446 530613 7 00007, Ausrüstung 2, 2 \n",
"128563 580234 7 00007, Ausrüstung 2, 2 \n",
"128636 586208 7 00007, Ausrüstung 2, 2 \n",
"128915 261786 7 00007, Ausrüstung 2, 2 \n",
"\n",
" ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"166 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"222 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"240 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"437 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"439 Beschichtungsmaschinen 1 Wartung \n",
"... ... ... ... \n",
"128396 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"128446 Beschichtungsmaschinen 1 Wartung \n",
"128563 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"128636 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n",
"128915 Beschichtungsmaschinen 1 Wartung \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"166 2021-03-17 5 1 \n",
"222 2021-07-27 5 1 \n",
"240 2021-09-29 5 1 \n",
"437 2021-05-06 5 1 \n",
"439 2021-07-01 5 1 \n",
"... ... ... ... \n",
"128396 2023-05-08 5 1 \n",
"128446 2023-05-30 5 1 \n",
"128563 2023-05-30 5 1 \n",
"128636 2023-06-12 5 1 \n",
"128915 2023-05-30 5 1 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"166 -Bereich Aufwicklung, Bogenwalze Madenschraube... NaN \n",
"222 2 Keilriemen gerissen- Bereich Abluftventilato... NaN \n",
"240 Wir benötigen einen weiteren Ersatzteilschrank... NaN \n",
"437 bitte dringend 10l Eimer zum Silikon versenden... NaN \n",
"439 Monatliche Kontrolle des Flusen-Absaugrohrs NaN \n",
"... ... ... \n",
"128396 KKT Chiller Auslauf Störung. Füllstand Min. STOP NaN \n",
"128446 Monatliche Überprüfung der Gasleitung mit dem ... NaN \n",
"128563 Mischer für Beschichtungsanlage bitte ausbrenn... NaN \n",
"128636 Haken im Kran Auslauf defekt NaN \n",
"128915 Kontrolle der Risiko-Ersatzteile NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"166 Walze (mechanischer Defekt) 2021-03-17 \n",
"222 Antriebsriemen (Keilriemen / Zahnriemen / Flac... 2021-07-27 \n",
"240 Allgemeine Reparaturarbeiten 2021-10-04 \n",
"437 Maschineninfrastruktur 2021-05-06 \n",
"439 Maschinen-Wartung monatlich 2021-06-28 \n",
"... ... ... \n",
"128396 Allgemeine Reparaturarbeiten 2023-05-08 \n",
"128446 01 Interne Reinigung / Pflege / Überprüfung 2023-06-05 \n",
"128563 Allgemeine Reparaturarbeiten 2023-05-30 \n",
"128636 Allgemeine Reparaturarbeiten 2023-06-12 \n",
"128915 Überprüfung Risikoersatzteile 2023-05-30 \n",
"\n",
" ErledigungsArtText \\\n",
"166 Intern UTT - Reparatur \n",
"222 Intern UTT - Reparatur \n",
"240 Intern UTT - Montage \n",
"437 Intern UTT - Wartung \n",
"439 Intern UTT - Sichtkontrolle \n",
"... ... \n",
"128396 Intern UTT - Reparatur \n",
"128446 Intern UTT - Prüfung \n",
"128563 Intern UTT - Reparatur \n",
"128636 Intern UTT - Reparatur \n",
"128915 Intern UTT - Dokumentenkontrolle \n",
"\n",
" ErledigungsBeschreibung \\\n",
"166 Madenschrauben angezogen \n",
"222 die Keilriemen SPA 1282 aus Neu Ulm geholt und... \n",
"240 Wurde montiert \n",
"437 NaN \n",
"439 NaN \n",
"... ... \n",
"128396 Kühlflüssigkeit aufgefüllt und Filter gewechse... \n",
"128446 Dichtheitsprüfung der Gasleitungen \n",
"128563 erledigt \n",
"128636 Haken getauscht \n",
"128915 erledigt.\\n \n",
"\n",
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn \\\n",
"166 Ausrüstung Ausrüstung 2021-03-17 \n",
"222 Ausrüstung Ausrüstung 2021-07-27 \n",
"240 Ausrüstung Ausrüstung 2021-10-04 \n",
"437 Ausrüstung Ausrüstung 2021-05-06 \n",
"439 NaN NaN NaT \n",
"... ... ... ... \n",
"128396 Ausrüstung 2 Ausrüstung 2023-05-08 \n",
"128446 NaN NaN 2023-06-05 \n",
"128563 Ausrüstung 2, Kombianlage Ausrüstung 2023-05-30 \n",
"128636 Ausrüstung Ausrüstung 2023-06-12 \n",
"128915 NaN NaN 2023-05-30 \n",
"\n",
" ErstellungsDatum \n",
"166 2021-03-17 \n",
"222 2021-07-27 \n",
"240 2021-09-29 \n",
"437 2021-05-06 \n",
"439 2021-03-03 \n",
"... ... \n",
"128396 2023-05-08 \n",
"128446 2023-04-24 \n",
"128563 2023-05-30 \n",
"128636 2023-06-12 \n",
"128915 2022-06-30 \n",
"\n",
"[272 rows x 20 columns]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base.loc[base['ObjektID'] == objs[0],:]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>66</td>\n",
" <td>92592</td>\n",
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
" <td>37</td>\n",
" <td>1654</td>\n",
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
" <td>37</td>\n",
" <td>1616</td>\n",
" <td>[0, 970, 2134, 2137]</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
" <td>37</td>\n",
" <td>1265</td>\n",
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
" <td>44</td>\n",
" <td>687</td>\n",
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
" <td>166</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2665</th>\n",
" <td>Überprüfung der Y-Achse Schneidbrücke am LC 2 ...</td>\n",
" <td>176</td>\n",
" <td>1</td>\n",
" <td>[20]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2664</th>\n",
" <td>Luftschlauch muss ausgetauscht werden. Ist und...</td>\n",
" <td>195</td>\n",
" <td>1</td>\n",
" <td>[1]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2663</th>\n",
" <td>Riemenscheibe tauschen auf 650 UPM</td>\n",
" <td>34</td>\n",
" <td>1</td>\n",
" <td>[74]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2660</th>\n",
" <td>Durchführung: Sollwert: 20 0,1g</td>\n",
" <td>31</td>\n",
" <td>1</td>\n",
" <td>[1746]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6752</th>\n",
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
" <td>99</td>\n",
" <td>1</td>\n",
" <td>[326]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6753 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
"33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n",
"130 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
"159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
"139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
"... ... ... ... \n",
"2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 176 1 \n",
"2664 Luftschlauch muss ausgetauscht werden. Ist und... 195 1 \n",
"2663 Riemenscheibe tauschen auf 650 UPM 34 1 \n",
"2660 Durchführung: Sollwert: 20 0,1g 31 1 \n",
"6752 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
"130 [0, 970, 2134, 2137] 4 \n",
"159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
"139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
"... ... ... \n",
"2665 [20] 1 \n",
"2664 [1] 1 \n",
"2663 [74] 1 \n",
"2660 [1746] 1 \n",
"6752 [326] 1 \n",
"\n",
"[6753 rows x 5 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers'"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1.iat[0,0]"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Wöchentliche Sichtkontrolle Reinigung'"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1.iat[1,0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### spaCy"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Durchführung: Sollwert: 20 0,1g'"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"string = temp1.iloc[-2,0]\n",
"#string = temp1.iloc[0,0]\n",
"string"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"string = 'Ich spiele jeden Tag mit den Kindern im Garten. Das ist schön.'\n",
"string = 'Die Maschine XYZ ist aufgrund einer Störung im Druckluftsystem defekt.'\n",
"#string = 'Wir benötigen das Werkzeug von Herr Stöppel, um das derzeit abzuarbeiten.Dies wird durch Herrn Strebe getan.'"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"doc = nlp(string)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(11, 11)\n"
]
},
{
"data": {
"text/plain": [
"array([[ 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3],\n",
" [ 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3],\n",
" [ 0, 0, 2, 3, 3, 3, 3, 3, 3, 3, 3],\n",
" [ 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3],\n",
" [ 0, 0, 0, 0, 4, 4, 4, 4, 4, 3, 3],\n",
" [ 0, 0, 0, 0, 0, 5, 6, 6, 6, 3, 3],\n",
" [ 0, 0, 0, 0, 0, 0, 6, 6, 6, 3, 3],\n",
" [ 0, 0, 0, 0, 0, 0, 0, 7, 7, 3, 3],\n",
" [ 0, 0, 0, 0, 0, 0, 0, 0, 8, 3, 3],\n",
" [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 3],\n",
" [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lca_matrix = doc.get_lca_matrix()\n",
"print(lca_matrix.shape)\n",
"lca_matrix = np.triu(lca_matrix)\n",
"lca_matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"nested children:\n",
"- [x] Gewichtung über Anzahl Erscheinungen\n",
"- [x] AUX-Wörter: evtl. alle aossoziierten Wörter in Beziehung setzen\n",
"- [ ] Dual Link zwischen zwei Wörtern eines Baums (sinnvoll?)\n",
" - nicht wirklich sinnvoll, da einfache Verbindung durch Gewicht schon berücksichtigt\n",
" - schlussendlich würde jede Verbindung im Gewicht verdoppelt werden"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# simulate occurence counter\n",
"OCC_COUNTER = 10"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"SPELL_CHECK_NON_CHARS = set([' ', '.', ',', ';', ':', '-'])\n",
"\n",
"def pre_clean_word(string: str) -> str:\n",
" \n",
" pattern = r'[^A-Za-zäöüÄÖÜ]+'\n",
" string = re.sub(pattern, '', string)\n",
" \"\"\"\n",
" for char in SPELL_CHECK_NON_CHARS:\n",
" string = string.replace(char, '')\n",
" \"\"\"\n",
" \n",
" return string\n",
"\n",
"# https://stackoverflow.com/questions/25341945/check-if-string-has-date-any-format \n",
"def is_str_date(string, fuzzy=False):\n",
" \n",
" try:\n",
" parse(string, fuzzy=fuzzy)\n",
" return True\n",
" except ValueError:\n",
" return False\n",
"\n",
"\n",
"def obtain_sub_tree(token):\n",
" # check if token is a POS of interest\n",
" descendants = list(token.subtree)\n",
" descendants.remove(token)\n",
" logger.debug(f'Token >>{token}<< has subtree >>{descendants}<<')\n",
" return descendants\n",
"\n",
"\n",
"def add_children_descendants(\n",
" parent,\n",
" weight,\n",
" connections,\n",
" unique_tokens,\n",
" children_sents,\n",
"):\n",
" # add child as key\n",
" if (parent.lemma_, parent.pos_) in connections:\n",
" connections[(parent.lemma_, parent.pos_)].append(children_sents)\n",
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
" else:\n",
" # do not add auxiliary words\n",
" if parent.pos_ != 'AUX':\n",
" unique_tokens.add(parent.lemma_)\n",
" connections[(parent.lemma_, parent.pos_)] = list()\n",
" connections[(parent.lemma_, parent.pos_)].append(children_sents)\n",
" #connections[parent.lemma_].append([descendant.lemma_, descendant])\n",
" \n",
" return None\n",
"\n",
"\n",
"def obtain_descendant_info(\n",
" doc,\n",
" weight,\n",
" POS_of_interest,\n",
" TAG_of_interest,\n",
" connections,\n",
" unique_tokens,\n",
" spell_check_candidates,\n",
" spell_check_whitelist,\n",
" spell_checker,\n",
" corrections,\n",
"):\n",
" global GENERAL_BLACKLIST\n",
" \n",
" # iterate over sentences\n",
" for sent in doc.sents:\n",
" # spell check list\n",
" spell_check_words = list()\n",
" \n",
" # iterate over tokens in one sentence\n",
" for token in sent:\n",
" \n",
" if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n",
" continue\n",
" elif token.lemma_.lower() in GENERAL_BLACKLIST:\n",
" logger.debug(f'Eliminated parent >>{token}<< because of blacklist')\n",
" continue\n",
" \n",
" # spell check\n",
" if token.lemma_.lower() not in spell_check_whitelist:\n",
" word = pre_clean_word(string=token.lemma_.lower())\n",
" if word in corrections:\n",
" word = corrections[word]\n",
" elif not word.isdigit():\n",
" spell_check_words.append(word)\n",
" \n",
" descendants = obtain_sub_tree(token=token)\n",
" \n",
" # iterate over all children if there are any\n",
" if descendants is not None:\n",
" # list with all children in the current sentence\n",
" children_sents = list()\n",
" \n",
" for child in descendants:\n",
" logger.debug(f'Token is >>{token}<< with child >>{child}<< and POS {child.pos_}')\n",
" \n",
" # elimnate cases of cross-references with verbs\n",
" if ((token.pos_ == 'AUX' or token.pos_ == 'VERB') and\n",
" (child.pos_ == 'AUX' or child.pos_ == 'VERB')):\n",
" continue\n",
" elif not (child.pos_ in POS_of_interest or child.tag_ in TAG_of_interest):\n",
" continue\n",
" elif child.lemma_.lower() in GENERAL_BLACKLIST:\n",
" logger.debug(f'Eliminated child >>{child}<< because of blacklist')\n",
" continue\n",
" \n",
" if (child not in DESC_BLACKLIST and\n",
" not is_str_date(string=child.text)):\n",
" children_sents.append((child.lemma_, weight))\n",
" \n",
" if child.lemma_ not in unique_tokens:\n",
" unique_tokens.add(child.lemma_)\n",
" \n",
" if child.lemma_.lower() not in spell_check_whitelist:\n",
" word = pre_clean_word(string=child.lemma_.lower())\n",
" if word in corrections:\n",
" word = corrections[word]\n",
" elif not word.isdigit():\n",
" spell_check_words.append(word)\n",
" \n",
" # add list of children for current parent if not empty\n",
" if children_sents:\n",
" add_children_descendants(\n",
" parent=token,\n",
" weight=weight,\n",
" connections=connections,\n",
" unique_tokens=unique_tokens,\n",
" children_sents=children_sents,\n",
" )\n",
"\n",
" misspelled_candidates = spell_checker.unknown(spell_check_words)\n",
" spell_check_candidates.update(misspelled_candidates)\n",
" \n",
" \n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Token</th>\n",
" <th>Lemma</th>\n",
" <th>POS</th>\n",
" <th>Tag</th>\n",
" <th>Dep</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Die</td>\n",
" <td>der</td>\n",
" <td>DET</td>\n",
" <td>ART</td>\n",
" <td>nk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Maschine</td>\n",
" <td>Maschine</td>\n",
" <td>NOUN</td>\n",
" <td>NN</td>\n",
" <td>sb</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>XYZ</td>\n",
" <td>XYZ</td>\n",
" <td>PROPN</td>\n",
" <td>NE</td>\n",
" <td>nk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ist</td>\n",
" <td>sein</td>\n",
" <td>AUX</td>\n",
" <td>VAFIN</td>\n",
" <td>ROOT</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>aufgrund</td>\n",
" <td>aufgrund</td>\n",
" <td>ADP</td>\n",
" <td>APPR</td>\n",
" <td>mo</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>einer</td>\n",
" <td>ein</td>\n",
" <td>DET</td>\n",
" <td>ART</td>\n",
" <td>nk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Störung</td>\n",
" <td>Störung</td>\n",
" <td>NOUN</td>\n",
" <td>NN</td>\n",
" <td>nk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>im</td>\n",
" <td>in</td>\n",
" <td>ADP</td>\n",
" <td>APPRART</td>\n",
" <td>mnr</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Druckluftsystem</td>\n",
" <td>Druckluftsystem</td>\n",
" <td>NOUN</td>\n",
" <td>NN</td>\n",
" <td>nk</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>defekt</td>\n",
" <td>defekt</td>\n",
" <td>ADV</td>\n",
" <td>ADJD</td>\n",
" <td>pd</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>.</td>\n",
" <td>--</td>\n",
" <td>PUNCT</td>\n",
" <td>$.</td>\n",
" <td>punct</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Token Lemma POS Tag Dep\n",
"0 Die der DET ART nk\n",
"1 Maschine Maschine NOUN NN sb\n",
"2 XYZ XYZ PROPN NE nk\n",
"3 ist sein AUX VAFIN ROOT\n",
"4 aufgrund aufgrund ADP APPR mo\n",
"5 einer ein DET ART nk\n",
"6 Störung Störung NOUN NN nk\n",
"7 im in ADP APPRART mnr\n",
"8 Druckluftsystem Druckluftsystem NOUN NN nk\n",
"9 defekt defekt ADV ADJD pd\n",
"10 . -- PUNCT $. punct"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame({\"Token\": [token.text for token in doc],\n",
" \"Lemma\": [token.lemma_ for token in doc],\n",
" \"POS\": [token.pos_ for token in doc],\n",
" \"Tag\": [token.tag_ for token in doc],\n",
" \"Dep\": [token.dep_ for token in doc]})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"def obtain_adj_matrix(unique_tokens, connections):\n",
"\n",
" adj_mat = pd.DataFrame(\n",
" data=0, \n",
" columns=list(unique_tokens), \n",
" index=list(unique_tokens),\n",
" dtype=np.uint32,\n",
" )\n",
" \n",
" for (pred, POS), descendants_list in connections.items():\n",
" #print(f'{pred=}, {descendants=}')\n",
" \n",
" for descendants in descendants_list:\n",
" #print(f'{descendants}')\n",
" \n",
" if POS != 'AUX':\n",
" for (desc, weight) in descendants:\n",
" adj_mat.at[pred, desc] += weight\n",
" \n",
" else:\n",
" if len(descendants) > 1:\n",
" # if auxiliary word, make connection between all associated words\n",
" combs = combinations(descendants, r=2)\n",
" \n",
" for comb in combs:\n",
" # comb is tuple ((word_1, weight), (word_2, weight))\n",
" weight = comb[0][1]\n",
" word_1 = comb[0][0]\n",
" word_2 = comb[1][0]\n",
" \n",
" \"\"\"\n",
" if ((word_1 == 'Eigenverantwortlichkeit' or word_1 == 'neu') and\n",
" (word_2 == 'Eigenverantwortlichkeit' or word_2 == 'neu')):\n",
" print(f'Hello from {pred=} with {descendants=}')\n",
" \"\"\"\n",
" \n",
" adj_mat.at[word_1, word_2] += weight\n",
" \n",
" \n",
" return adj_mat\n",
"\n",
"\n",
"def make_undir_adj_matrix(adj_mat):\n",
" \n",
" adj_mat_undir = adj_mat.copy()\n",
" arr = adj_mat_undir.to_numpy()\n",
" arr_upper = np.triu(arr)\n",
" arr_lower = np.tril(arr)\n",
" arr_lower = np.rot90(np.fliplr(arr_lower))\n",
" arr_new = arr_lower + arr_upper\n",
" \n",
" adj_mat_undir.loc[:] = arr_new\n",
" \n",
" return adj_mat_undir"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<span class=\"tex2jax_ignore\"><svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" xml:lang=\"de\" id=\"ef1bf216422b43369e5c4ba89cb854dd-0\" class=\"displacy\" width=\"1800\" height=\"399.5\" direction=\"ltr\" style=\"max-width: none; height: 399.5px; color: #000000; background: #ffffff; font-family: Arial; direction: ltr\">\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"50\">Die</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"50\">DET</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"225\">Maschine</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"225\">NOUN</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"400\">XYZ</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"400\">PROPN</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"575\">ist</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"575\">AUX</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"750\">aufgrund</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"750\">ADP</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"925\">einer</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"925\">DET</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1100\">Störung</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1100\">NOUN</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1275\">im</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1275\">ADP</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1450\">Druckluftsystem</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1450\">NOUN</tspan>\n",
"</text>\n",
"\n",
"<text class=\"displacy-token\" fill=\"currentColor\" text-anchor=\"middle\" y=\"309.5\">\n",
" <tspan class=\"displacy-word\" fill=\"currentColor\" x=\"1625\">defekt.</tspan>\n",
" <tspan class=\"displacy-tag\" dy=\"2em\" fill=\"currentColor\" x=\"1625\">ADV</tspan>\n",
"</text>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-0\" stroke-width=\"2px\" d=\"M70,264.5 C70,177.0 215.0,177.0 215.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-0\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M70,266.5 L62,254.5 78,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-1\" stroke-width=\"2px\" d=\"M245,264.5 C245,89.5 570.0,89.5 570.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-1\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">sb</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M245,266.5 L237,254.5 253,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-2\" stroke-width=\"2px\" d=\"M245,264.5 C245,177.0 390.0,177.0 390.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-2\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M390.0,266.5 L398.0,254.5 382.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-3\" stroke-width=\"2px\" d=\"M595,264.5 C595,177.0 740.0,177.0 740.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-3\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">mo</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M740.0,266.5 L748.0,254.5 732.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-4\" stroke-width=\"2px\" d=\"M945,264.5 C945,177.0 1090.0,177.0 1090.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-4\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M945,266.5 L937,254.5 953,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-5\" stroke-width=\"2px\" d=\"M770,264.5 C770,89.5 1095.0,89.5 1095.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-5\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M1095.0,266.5 L1103.0,254.5 1087.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-6\" stroke-width=\"2px\" d=\"M1120,264.5 C1120,177.0 1265.0,177.0 1265.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-6\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">mnr</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M1265.0,266.5 L1273.0,254.5 1257.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-7\" stroke-width=\"2px\" d=\"M1295,264.5 C1295,177.0 1440.0,177.0 1440.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-7\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">nk</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M1440.0,266.5 L1448.0,254.5 1432.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"\n",
"<g class=\"displacy-arrow\">\n",
" <path class=\"displacy-arc\" id=\"arrow-ef1bf216422b43369e5c4ba89cb854dd-0-8\" stroke-width=\"2px\" d=\"M595,264.5 C595,2.0 1625.0,2.0 1625.0,264.5\" fill=\"none\" stroke=\"currentColor\"/>\n",
" <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
" <textPath xlink:href=\"#arrow-ef1bf216422b43369e5c4ba89cb854dd-0-8\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">pd</textPath>\n",
" </text>\n",
" <path class=\"displacy-arrowhead\" d=\"M1625.0,266.5 L1633.0,254.5 1617.0,254.5\" fill=\"currentColor\"/>\n",
"</g>\n",
"</svg></span>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"spacy.displacy.render(doc)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Gesamter Datensatz"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"# analysiere erste 10 Einträge\n",
"descr = temp1[['descr', 'num_occur']]\n",
"#descr = descr.iloc[50:200,:]"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6753"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(descr)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>num_occur</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>92592</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
" <td>1654</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
" <td>1616</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
" <td>1265</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
" <td>687</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2665</th>\n",
" <td>Überprüfung der Y-Achse Schneidbrücke am LC 2 ...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2664</th>\n",
" <td>Luftschlauch muss ausgetauscht werden. Ist und...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2663</th>\n",
" <td>Riemenscheibe tauschen auf 650 UPM</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2660</th>\n",
" <td>Durchführung: Sollwert: 20 0,1g</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6752</th>\n",
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6753 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" descr num_occur\n",
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 92592\n",
"33 Wöchentliche Sichtkontrolle Reinigung 1654\n",
"130 Tägliche Überprüfung der Ölabscheider 1616\n",
"159 Wöchentliche Kontrolle der WC-Anlagen 1265\n",
"139 Halbjährliche Kontrolle des Stabbreithalters 687\n",
"... ... ...\n",
"2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 1\n",
"2664 Luftschlauch muss ausgetauscht werden. Ist und... 1\n",
"2663 Riemenscheibe tauschen auf 650 UPM 1\n",
"2660 Durchführung: Sollwert: 20 0,1g 1\n",
"6752 Befestigung Deckel für Batteriefach defekt Hal... 1\n",
"\n",
"[6753 rows x 2 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descr"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"#LOAD_CALC_FILES = True\n",
"#LOAD_CALC_FILES = False\n",
"#IS_TEST = True\n",
"IS_TEST = False"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"spell_check_whitelist = {\n",
" '',\n",
" 'beschlag',\n",
" 'brandschutztechnische',\n",
" 'dichtung',\n",
" 'festhaltevorrichtung',\n",
" 'funktion',\n",
" 'halbjährliche',\n",
" 'kontrolle',\n",
" 'maschinenhersteller',\n",
" 'prüfung',\n",
" 'reinigung',\n",
" 'scharnier',\n",
" 'schließvorrichtung',\n",
" 'schmierung',\n",
" 'sichtkontrolle',\n",
" 'stabbreithalter',\n",
" 'technikrundgang',\n",
" 'vorgabe',\n",
" 'wartungstätigkeit',\n",
" 'wcanlage',\n",
" 'ölabscheider',\n",
" 'abarbeiten',\n",
" 'abgleichen',\n",
" 'abschmieren',\n",
" 'abschmierung',\n",
" 'abteilungsleiter',\n",
" 'akku',\n",
" 'analyse',\n",
" 'arbeitsplan',\n",
" 'aschenbecher',\n",
" 'auffüllen',\n",
" 'auflistung',\n",
" 'befestigungsschraube',\n",
" 'beschädigung',\n",
" 'betriebsstunde',\n",
" 'blombe',\n",
" 'blombieren',\n",
" 'brückner',\n",
" 'campenabwickler',\n",
" 'campenaufwickler',\n",
" 'desinfektionsmittel',\n",
" 'dichtigkeit',\n",
" 'druckkontrolle',\n",
" 'efficiosystem',\n",
" 'eigenverantwortlichkeit',\n",
" 'einrichtung',\n",
" 'email',\n",
" 'erledigungsdatum',\n",
" 'extradate',\n",
" 'extradatum',\n",
" 'filter',\n",
" 'firma',\n",
" 'formplatte',\n",
" 'frostprävention',\n",
" 'gegendruckbolze',\n",
" 'gesamtanlage',\n",
" 'heizungsanlage',\n",
" 'keller',\n",
" 'kesselhauskontrolle',\n",
" 'kesselwasser',\n",
" 'koffer',\n",
" 'kompensator',\n",
" 'kompressorstation',\n",
" 'kondensat',\n",
" 'kühlturm',\n",
" 'kühltürme',\n",
" 'lager',\n",
" 'laserabteilung',\n",
" 'leckage',\n",
" 'leerung',\n",
" 'leiterprüfung',\n",
" 'linearkugellager',\n",
" 'luftdruckkontrolle',\n",
" 'magazin',\n",
" 'maschinenbediener',\n",
" 'messwert',\n",
" 'monat',\n",
" 'motor',\n",
" 'papiermüllbehälter',\n",
" 'personalbüro',\n",
" 'pflasterschrank',\n",
" 'rieme',\n",
" 'rollenkette',\n",
" 'rundgang',\n",
" 'schweißkopf',\n",
" 'schweisskopf',\n",
" 'sichtprüfung',\n",
" 'speisewasser',\n",
" 'sprinkleranlage',\n",
" 'temperatursensor',\n",
" 'terminieren',\n",
" 'ticket',\n",
" 'trommel',\n",
" 'täglicher',\n",
" 'uvröhre',\n",
" 'ventilator',\n",
" 'verbandsmaterial',\n",
" 'verschleiß',\n",
" 'verschleiss',\n",
" 'vorbelegung',\n",
" 'wartung',\n",
" 'wartungsarbeit',\n",
" 'wartungsplan',\n",
" 'wasseraufbereitung',\n",
" 'wasseraufbereitungsanlage',\n",
" 'wasserverbrauch',\n",
" 'weberei',\n",
" 'wumagtrockner',\n",
" 'wäscherkontrolle',\n",
" 'wöchig',\n",
" 'abdichten',\n",
" 'abfluprüfung',\n",
" 'ablesen',\n",
" 'abluftkanal',\n",
" 'absauganlage',\n",
" 'abspeichern',\n",
" 'absprache',\n",
" 'aktivkohlepatron',\n",
" 'aktivkohlepatrone',\n",
" 'anbackung',\n",
" 'anfragen',\n",
" 'angebot',\n",
" 'anpresswalze',\n",
" 'ansaug',\n",
" 'anschluss',\n",
" 'anschluß',\n",
" 'anzahl',\n",
" 'auen',\n",
" 'auenbereich',\n",
" 'aueneinheit',\n",
" 'aufwickler',\n",
" 'ausblasöffnung',\n",
" 'ausbrennen',\n",
" 'auslassventil',\n",
" 'ausrüstung',\n",
" 'austausch',\n",
" 'axialpendelrollenlager',\n",
" 'batteriewechsel',\n",
" 'batterieüberprüfung',\n",
" 'baugruppe',\n",
" 'baumwolltuch',\n",
" 'bauteil',\n",
" 'befeuchter',\n",
" 'beleuchtung',\n",
" 'beschichtunglegierung',\n",
" 'besprechungszimmer',\n",
" 'bestandskontrolle',\n",
" 'bestellformular',\n",
" 'bestätigung',\n",
" 'bezeichnung',\n",
" 'binder',\n",
" 'blutstop',\n",
" 'bolze',\n",
" 'breitstreckwalze',\n",
" 'containerstellfläche',\n",
" 'contrawalze',\n",
" 'dachfläche',\n",
" 'dampfzylinder',\n",
" 'deformierung',\n",
" 'dezember',\n",
" 'din',\n",
" 'docke',\n",
" 'dokumentation',\n",
" 'dosierpumpe',\n",
" 'druckluftbehälter',\n",
" 'druckluftleitung',\n",
" 'druckluftschläuche',\n",
" 'drucktestkontrolle',\n",
" 'einterminieren',\n",
" 'eintragung',\n",
" 'einzelprotokoll',\n",
" 'einziehwalze',\n",
" 'elektisch',\n",
" 'element',\n",
" 'enthärtung',\n",
" 'entwässern',\n",
" 'erledigungsbeschreibeung',\n",
" 'erstehilfeeinrichtung',\n",
" 'erweiterung',\n",
" 'explosionsschutzanlage',\n",
" 'extradaten',\n",
" 'exzenterringbefestigung',\n",
" 'fa',\n",
" 'fach',\n",
" 'faltenbalge',\n",
" 'feedbackinput',\n",
" 'feuerwehrumfahrung',\n",
" 'filert',\n",
" 'filteranlage',\n",
" 'filterelement',\n",
" 'filterstufe',\n",
" 'fixtermin',\n",
" 'flanschlager',\n",
" 'flanschlagerquadrat',\n",
" 'fluchtwegsymbol',\n",
" 'flusenabsaugrohr',\n",
" 'freilauf',\n",
" 'fremdkörper',\n",
" 'führungswagen',\n",
" 'gaslager',\n",
" 'gaszählerstand',\n",
" 'gatter',\n",
" 'geräteinner',\n",
" 'geräteinneres',\n",
" 'geräusch',\n",
" 'gesamt',\n",
" 'gesamterzeugt',\n",
" 'getränkeautomat',\n",
" 'gewindebefestigung',\n",
" 'gewindestiftbefestigung',\n",
" 'gleitschiene',\n",
" 'grat',\n",
" 'gro',\n",
" 'grundplatte',\n",
" 'halle',\n",
" 'haupteingang',\n",
" 'hebebühne',\n",
" 'hebezeug',\n",
" 'helm',\n",
" 'hersteller',\n",
" 'hochregal',\n",
" 'hochtemperatur',\n",
" 'hochtemperatureinsatz',\n",
" 'hydraulik',\n",
" 'hydrauliköl',\n",
" 'impulseingang',\n",
" 'indikator',\n",
" 'inneneinheit',\n",
" 'insektenvernichter',\n",
" 'kabel',\n",
" 'kammer',\n",
" 'karton',\n",
" 'kegelradgetriebe',\n",
" 'kegelradgetriebemotor',\n",
" 'kette',\n",
" 'klemmrolle',\n",
" 'klimaanlage',\n",
" 'klimabühne',\n",
" 'klimagerät',\n",
" 'kompressor',\n",
" 'kompressorluftwert',\n",
" 'kontoll',\n",
" 'kontrawalze',\n",
" 'kontroll',\n",
" 'krankheit',\n",
" 'krän',\n",
" 'kräne',\n",
" 'kuehlaggregat',\n",
" 'kw',\n",
" 'kühlgerät',\n",
" 'lagereinheit',\n",
" 'lagereinsatz',\n",
" 'lagerort',\n",
" 'lagerung',\n",
" 'laser',\n",
" 'laufgeräusche',\n",
" 'luftansaugseite',\n",
" 'luftfilter',\n",
" 'luftfilterwasserabscheider',\n",
" 'luftmenge',\n",
" 'luftreiniger',\n",
" 'lösungsmittel',\n",
" 'lüftungsanlage',\n",
" 'macke',\n",
" 'managementsystem',\n",
" 'maschinenanschluss',\n",
" 'materialzersetzung',\n",
" 'messlager',\n",
" 'micron',\n",
" 'mischer',\n",
" 'monatlicher',\n",
" 'monatliches',\n",
" 'monteur',\n",
" 'moos',\n",
" 'motorstart',\n",
" 'nachfetten',\n",
" 'nachschmieren',\n",
" 'nachspann',\n",
" 'neuvertrag',\n",
" 'nord',\n",
" 'nottelefon',\n",
" 'nr',\n",
" 'oberer',\n",
" 'oberflächenkontrolle',\n",
" 'objektkarte',\n",
" 'palette',\n",
" 'pendelkugellager',\n",
" 'pfeifer',\n",
" 'platine',\n",
" 'pneum',\n",
" 'pneumatikventil',\n",
" 'pneumatisch',\n",
" 'pos',\n",
" 'positioniersystem',\n",
" 'prozesskennzahl',\n",
" 'prüfbericht',\n",
" 'prüfplan',\n",
" 'rampenbereich',\n",
" 'rauwalze',\n",
" 'regalprüfer',\n",
" 'regalsicherungsanlage',\n",
" 'reiniger',\n",
" 'reinigungstuch',\n",
" 'restlich',\n",
" 'risikoersatzteil',\n",
" 'rohrtrenner',\n",
" 'roller',\n",
" 'rundgangkontrollen',\n",
" 'rückmeldung',\n",
" 'sae',\n",
" 'sauberkeit',\n",
" 'schlitten',\n",
" 'schmierstoff',\n",
" 'schmierstoffmenge',\n",
" 'schneider',\n",
" 'schraube',\n",
" 'schraubenbestand',\n",
" 'schutzabdeckung',\n",
" 'sicherheitsbeleuchtung',\n",
" 'sicherheitseinrichtung',\n",
" 'sicherheitslichtschranke',\n",
" 'sicherheitsweste',\n",
" 'sicherstellung',\n",
" 'sonotrode',\n",
" 'sonotrodenständer',\n",
" 'spannkopflager',\n",
" 'spannlager',\n",
" 'spannrahmen',\n",
" 'spindel',\n",
" 'spindelhubgetriebe',\n",
" 'spindelmutter',\n",
" 'spülzeitprüfung',\n",
" 'stab',\n",
" 'stadtwasser',\n",
" 'stehlager',\n",
" 'stehlagergehäuse',\n",
" 'steuerung',\n",
" 'stückliste',\n",
" 'systemumstellung',\n",
" 'telefonanlage',\n",
" 'telefonat',\n",
" 'termin',\n",
" 'terminabsprache',\n",
" 'terminiern',\n",
" 'terminiert',\n",
" 'terminierung',\n",
" 'terminvorschlag',\n",
" 'testomat',\n",
" 'thermoheizelement',\n",
" 'torsprechanlage',\n",
" 'trinkwassernetz',\n",
" 'trockenzylinder',\n",
" 'tänzerrolle',\n",
" 'türdichtung',\n",
" 'türgriff',\n",
" 'türsicherung',\n",
" 'umlenkwalzen',\n",
" 'umrandung',\n",
" 'unkraut',\n",
" 'uschienenführung',\n",
" 'uvv',\n",
" 'ventil',\n",
" 'verbaut',\n",
" 'verbrennungsset',\n",
" 'vereinbarung',\n",
" 'verkalkung',\n",
" 'verschleiteileinsatz',\n",
" 'verschmutzung',\n",
" 'verschmutzungenlos',\n",
" 'verstellung',\n",
" 'verunreinigung',\n",
" 'vollständigkeit',\n",
" 'volumenzähler',\n",
" 'vorderer',\n",
" 'vordruck',\n",
" 'vorfilter',\n",
" 'vorfilterflie',\n",
" 'vorliegen',\n",
" 'vormonat',\n",
" 'wartungsintervall',\n",
" 'wartungsvertrag',\n",
" 'wasserfilter',\n",
" 'wasserhärte',\n",
" 'wasserpegelkontrolle',\n",
" 'wasserzählerstand',\n",
" 'wechselintervall',\n",
" 'wärmetauscher',\n",
" 'zahnrieme',\n",
" 'zahnstange',\n",
" 'zuleitung',\n",
" 'zuschicken',\n",
" 'ölfüllung',\n",
" 'ölstand',\n",
" 'ölstandsichtprüfung',\n",
" 'ölstandskontrolle',\n",
" 'überziehen'\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"corrections: dict[str, str] = {\n",
" 'desifektionsmittel': 'desinfektionsmittel',\n",
" 'schweikopf': 'schweisskopf',\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:base:Number of entries processed: 1, Percent completed: 0.01\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:base:Number of entries processed: 501, Percent completed: 7.42\n",
"INFO:base:Number of entries processed: 1001, Percent completed: 14.82\n",
"INFO:base:Number of entries processed: 1501, Percent completed: 22.23\n",
"INFO:base:Number of entries processed: 2001, Percent completed: 29.63\n",
"INFO:base:Number of entries processed: 2501, Percent completed: 37.04\n",
"INFO:base:Number of entries processed: 3001, Percent completed: 44.44\n",
"INFO:base:Number of entries processed: 3501, Percent completed: 51.84\n",
"INFO:base:Number of entries processed: 4001, Percent completed: 59.25\n",
"INFO:base:Number of entries processed: 4501, Percent completed: 66.65\n",
"INFO:base:Number of entries processed: 5001, Percent completed: 74.06\n",
"INFO:base:Number of entries processed: 5501, Percent completed: 81.46\n",
"INFO:base:Number of entries processed: 6001, Percent completed: 88.86\n",
"INFO:base:Number of entries processed: 6501, Percent completed: 96.27\n"
]
}
],
"source": [
"# adjacency matrix\n",
"connections = dict()\n",
"unique_tokens = set()\n",
"UPDATE_STATUS = 500\n",
"length_data = len(descr)\n",
"spell_check_candidates = set()\n",
"spell_checker = SpellChecker(language='de', distance=1)\n",
"\n",
"if not LOAD_CALC_FILES or IS_TEST:\n",
" for count, description in enumerate(descr.iterrows()):\n",
" \n",
" text = description[1]['descr']\n",
" weight = description[1]['num_occur']\n",
" \n",
" doc = nlp(text)\n",
" \n",
" obtain_descendant_info(\n",
" doc=doc,\n",
" weight=weight,\n",
" POS_of_interest=POS_of_interest,\n",
" TAG_of_interest=TAG_of_interest,\n",
" connections=connections,\n",
" unique_tokens=unique_tokens,\n",
" spell_check_candidates=spell_check_candidates,\n",
" spell_check_whitelist=spell_check_whitelist,\n",
" spell_checker=spell_checker,\n",
" corrections=corrections,\n",
" )\n",
" \n",
" if count % UPDATE_STATUS == 0:\n",
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"ADJ_DF_PATH = './Graphanalyse/adj_mat_df.fth'\n",
"if not IS_TEST:\n",
" if LOAD_CALC_FILES:\n",
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
" # additional information\n",
" connections = load_pickle('connections.pkl')\n",
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
" else:\n",
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
" save_df = adj_mat_undir.reset_index()\n",
" save_df.to_feather(ADJ_DF_PATH)\n",
" # additional information\n",
" save_pickle(obj=connections, path='connections.pkl')\n",
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>funktionsfähig</th>\n",
" <th>Rechter</th>\n",
" <th>Laserteller</th>\n",
" <th>vorbereiten</th>\n",
" <th>weiterer</th>\n",
" <th>Ausbau</th>\n",
" <th>Travers</th>\n",
" <th>Funktionsbereitschaft</th>\n",
" <th>umwandeln</th>\n",
" <th>Hechtanlage</th>\n",
" <th>...</th>\n",
" <th>Filterpumpe</th>\n",
" <th>entwickeln</th>\n",
" <th>Pumpenstab</th>\n",
" <th>Hauptrade</th>\n",
" <th>anlernen</th>\n",
" <th>Begutachtung</th>\n",
" <th>Betriebszeit</th>\n",
" <th>Wassereinbruch</th>\n",
" <th>Antriebszahnrad</th>\n",
" <th>Prostataproblem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>-Austausch</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Befestihgung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Bereich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Betonblock-</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Bremskombination</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziechen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziehen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziehn</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>üblich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>üperprüfen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7046 rows × 7046 columns</p>\n",
"</div>"
],
"text/plain": [
" funktionsfähig Rechter Laserteller vorbereiten \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" weiterer Ausbau Travers Funktionsbereitschaft \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" umwandeln Hechtanlage ... Filterpumpe entwickeln \\\n",
"-Austausch 0 0 ... 0 0 \n",
"-Befestihgung 0 0 ... 0 0 \n",
"-Bereich 0 0 ... 0 0 \n",
"-Betonblock- 0 0 ... 0 0 \n",
"-Bremskombination 0 0 ... 0 0 \n",
"... ... ... ... ... ... \n",
"überziechen 0 0 ... 0 0 \n",
"überziehen 0 0 ... 0 0 \n",
"überziehn 0 0 ... 0 0 \n",
"üblich 0 0 ... 0 0 \n",
"üperprüfen 0 0 ... 0 0 \n",
"\n",
" Pumpenstab Hauptrade anlernen Begutachtung \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" Betriebszeit Wassereinbruch Antriebszahnrad \\\n",
"-Austausch 0 0 0 \n",
"-Befestihgung 0 0 0 \n",
"-Bereich 0 0 0 \n",
"-Betonblock- 0 0 0 \n",
"-Bremskombination 0 0 0 \n",
"... ... ... ... \n",
"überziechen 0 0 0 \n",
"überziehen 0 0 0 \n",
"überziehn 0 0 0 \n",
"üblich 0 0 0 \n",
"üperprüfen 0 0 0 \n",
"\n",
" Prostataproblem \n",
"-Austausch 0 \n",
"-Befestihgung 0 \n",
"-Bereich 0 \n",
"-Betonblock- 0 \n",
"-Bremskombination 0 \n",
"... ... \n",
"überziechen 0 \n",
"überziehen 0 \n",
"überziehn 0 \n",
"üblich 0 \n",
"üperprüfen 0 \n",
"\n",
"[7046 rows x 7046 columns]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_mat_undir.sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>funktionsfähig</th>\n",
" <th>Rechter</th>\n",
" <th>Laserteller</th>\n",
" <th>vorbereiten</th>\n",
" <th>weiterer</th>\n",
" <th>Ausbau</th>\n",
" <th>Travers</th>\n",
" <th>Funktionsbereitschaft</th>\n",
" <th>umwandeln</th>\n",
" <th>Hechtanlage</th>\n",
" <th>...</th>\n",
" <th>Filterpumpe</th>\n",
" <th>entwickeln</th>\n",
" <th>Pumpenstab</th>\n",
" <th>Hauptrade</th>\n",
" <th>anlernen</th>\n",
" <th>Begutachtung</th>\n",
" <th>Betriebszeit</th>\n",
" <th>Wassereinbruch</th>\n",
" <th>Antriebszahnrad</th>\n",
" <th>Prostataproblem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>-Austausch</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Befestihgung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Bereich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Betonblock-</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Bremskombination</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziechen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziehen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziehn</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>üblich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>üperprüfen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7046 rows × 7046 columns</p>\n",
"</div>"
],
"text/plain": [
" funktionsfähig Rechter Laserteller vorbereiten \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" weiterer Ausbau Travers Funktionsbereitschaft \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" umwandeln Hechtanlage ... Filterpumpe entwickeln \\\n",
"-Austausch 0 0 ... 0 0 \n",
"-Befestihgung 0 0 ... 0 0 \n",
"-Bereich 0 0 ... 0 0 \n",
"-Betonblock- 0 0 ... 0 0 \n",
"-Bremskombination 0 0 ... 0 0 \n",
"... ... ... ... ... ... \n",
"überziechen 0 0 ... 0 0 \n",
"überziehen 0 0 ... 0 0 \n",
"überziehn 0 0 ... 0 0 \n",
"üblich 0 0 ... 0 0 \n",
"üperprüfen 0 0 ... 0 0 \n",
"\n",
" Pumpenstab Hauptrade anlernen Begutachtung \\\n",
"-Austausch 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Bereich 0 0 0 0 \n",
"-Betonblock- 0 0 0 0 \n",
"-Bremskombination 0 0 0 0 \n",
"... ... ... ... ... \n",
"überziechen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"überziehn 0 0 0 0 \n",
"üblich 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" Betriebszeit Wassereinbruch Antriebszahnrad \\\n",
"-Austausch 0 0 0 \n",
"-Befestihgung 0 0 0 \n",
"-Bereich 0 0 0 \n",
"-Betonblock- 0 0 0 \n",
"-Bremskombination 0 0 0 \n",
"... ... ... ... \n",
"überziechen 0 0 0 \n",
"überziehen 0 0 0 \n",
"überziehn 0 0 0 \n",
"üblich 0 0 0 \n",
"üperprüfen 0 0 0 \n",
"\n",
" Prostataproblem \n",
"-Austausch 0 \n",
"-Befestihgung 0 \n",
"-Bereich 0 \n",
"-Betonblock- 0 \n",
"-Bremskombination 0 \n",
"... ... \n",
"überziechen 0 \n",
"überziehen 0 \n",
"überziehn 0 \n",
"üblich 0 \n",
"üperprüfen 0 \n",
"\n",
"[7046 rows x 7046 columns]"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_mat_undir.sort_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>koennen</th>\n",
" <th>Weiterleitung</th>\n",
" <th>Brand</th>\n",
" <th>ein</th>\n",
" <th>Geräteinneres</th>\n",
" <th>Schmerz</th>\n",
" <th>Monat</th>\n",
" <th>Kontrawalzenbelag</th>\n",
" <th>Funktionstest</th>\n",
" <th>Kesselwasser</th>\n",
" <th>...</th>\n",
" <th>Niveau</th>\n",
" <th>Sprinkleranlage</th>\n",
" <th>Abdeckglas</th>\n",
" <th>Stoptast</th>\n",
" <th>ORing</th>\n",
" <th>ausblasen</th>\n",
" <th>absprechen</th>\n",
" <th>Artikelnummer</th>\n",
" <th>Fehlersichtung</th>\n",
" <th>brannen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>koennen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Weiterleitung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Brand</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ein</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Geräteinneres</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ausblasen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>absprechen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Artikelnummer</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fehlersichtung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>brannen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6959 rows × 6959 columns</p>\n",
"</div>"
],
"text/plain": [
" koennen Weiterleitung Brand ein Geräteinneres Schmerz \\\n",
"koennen 0 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 0 \n",
"ein 0 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 0 \n",
"... ... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 0 \n",
"\n",
" Monat Kontrawalzenbelag Funktionstest Kesselwasser ... \\\n",
"koennen 0 0 0 0 ... \n",
"Weiterleitung 0 0 0 0 ... \n",
"Brand 0 0 0 0 ... \n",
"ein 0 0 0 0 ... \n",
"Geräteinneres 0 0 0 0 ... \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 ... \n",
"absprechen 0 0 0 0 ... \n",
"Artikelnummer 0 0 0 0 ... \n",
"Fehlersichtung 0 0 0 0 ... \n",
"brannen 0 0 0 0 ... \n",
"\n",
" Niveau Sprinkleranlage Abdeckglas Stoptast ORing \\\n",
"koennen 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 \n",
"ein 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 \n",
"\n",
" ausblasen absprechen Artikelnummer Fehlersichtung brannen \n",
"koennen 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 \n",
"ein 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 \n",
"\n",
"[6959 rows x 6959 columns]"
]
},
"execution_count": 776,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_mat_undir"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"24490"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"92882"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.max(arr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"195"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"uni_arr = np.unique(arr)\n",
"len(uni_arr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Threshold"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"WEIGHT_THRESHOLD = 50"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"387"
]
},
"execution_count": 788,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"177"
]
},
"execution_count": 789,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp = np.sum(arr, axis=0)\n",
"np.count_nonzero(temp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"thresh_adj_mat = adj_mat_undir.copy()\n",
"thresh_adj_mat.loc[:] = arr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>koennen</th>\n",
" <th>Weiterleitung</th>\n",
" <th>Brand</th>\n",
" <th>ein</th>\n",
" <th>Geräteinneres</th>\n",
" <th>Schmerz</th>\n",
" <th>Monat</th>\n",
" <th>Kontrawalzenbelag</th>\n",
" <th>Funktionstest</th>\n",
" <th>Kesselwasser</th>\n",
" <th>...</th>\n",
" <th>Niveau</th>\n",
" <th>Sprinkleranlage</th>\n",
" <th>Abdeckglas</th>\n",
" <th>Stoptast</th>\n",
" <th>ORing</th>\n",
" <th>ausblasen</th>\n",
" <th>absprechen</th>\n",
" <th>Artikelnummer</th>\n",
" <th>Fehlersichtung</th>\n",
" <th>brannen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>koennen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Weiterleitung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Brand</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ein</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Geräteinneres</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ausblasen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>absprechen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Artikelnummer</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fehlersichtung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>brannen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6959 rows × 6959 columns</p>\n",
"</div>"
],
"text/plain": [
" koennen Weiterleitung Brand ein Geräteinneres Schmerz \\\n",
"koennen 0 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 0 \n",
"ein 0 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 0 \n",
"... ... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 0 \n",
"\n",
" Monat Kontrawalzenbelag Funktionstest Kesselwasser ... \\\n",
"koennen 0 0 0 0 ... \n",
"Weiterleitung 0 0 0 0 ... \n",
"Brand 0 0 0 0 ... \n",
"ein 0 0 0 0 ... \n",
"Geräteinneres 0 0 0 0 ... \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 ... \n",
"absprechen 0 0 0 0 ... \n",
"Artikelnummer 0 0 0 0 ... \n",
"Fehlersichtung 0 0 0 0 ... \n",
"brannen 0 0 0 0 ... \n",
"\n",
" Niveau Sprinkleranlage Abdeckglas Stoptast ORing \\\n",
"koennen 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 \n",
"ein 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 \n",
"\n",
" ausblasen absprechen Artikelnummer Fehlersichtung brannen \n",
"koennen 0 0 0 0 0 \n",
"Weiterleitung 0 0 0 0 0 \n",
"Brand 0 0 0 0 0 \n",
"ein 0 0 0 0 0 \n",
"Geräteinneres 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"ausblasen 0 0 0 0 0 \n",
"absprechen 0 0 0 0 0 \n",
"Artikelnummer 0 0 0 0 0 \n",
"Fehlersichtung 0 0 0 0 0 \n",
"brannen 0 0 0 0 0 \n",
"\n",
"[6959 rows x 6959 columns]"
]
},
"execution_count": 791,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"thresh_adj_mat"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{WEIGHT_THRESHOLD}.csv'\n",
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Testing***"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"important_words = []\n",
"all_entities = []\n",
"pos_tags = set()\n",
"pos_counter = dict()\n",
"token_counter = 0\n",
"\n",
"for description in descr:\n",
" doc = nlp(description)\n",
" \n",
" relevant_words = []\n",
" for token in doc:\n",
" POS = token.pos_\n",
" token_counter += 1\n",
" if POS in pos_counter:\n",
" pos_counter[POS] += 1\n",
" else:\n",
" pos_counter[POS] = 1\n",
" \n",
" if (not token.is_stop and not token.is_punct and \n",
" not token.is_space and (POS == 'NOUN' or \n",
" POS == 'PROPN' or \n",
" POS == 'ADJ' or \n",
" POS == 'ADV')):\n",
" relevant_words.append((token.lemma_.lower(), POS))\n",
" #pos_tags.add(token.pos_)\n",
" \n",
" entities = [] \n",
" for ent in doc.ents:\n",
" entities.append((ent.text, ent.label_))\n",
" \n",
" important_words.extend(relevant_words)\n",
" all_entities.extend(entities)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('täglich', 'ADJ'),\n",
" ('wartungstätigkeit', 'NOUN'),\n",
" ('vorgabe', 'NOUN'),\n",
" ('maschinenhersteller', 'NOUN'),\n",
" ('wöchentliche', 'ADJ'),\n",
" ('sichtkontrolle', 'NOUN'),\n",
" ('reinigung', 'NOUN'),\n",
" ('täglich', 'ADJ'),\n",
" ('überprüfung', 'NOUN'),\n",
" ('ölabscheider', 'NOUN'),\n",
" ('wöchentlich', 'ADJ'),\n",
" ('kontrolle', 'NOUN'),\n",
" ('wc-anlage', 'NOUN'),\n",
" ('halbjährliche', 'ADJ'),\n",
" ('kontrolle', 'NOUN'),\n",
" ('stabbreithalter', 'NOUN'),\n",
" ('brandschutztechnische', 'ADJ'),\n",
" ('prüfung', 'NOUN'),\n",
" ('prüfung', 'NOUN'),\n",
" ('scharniere', 'NOUN'),\n",
" ('dichtung', 'NOUN'),\n",
" ('schließvorrichtung', 'NOUN'),\n",
" ('schloß', 'NOUN'),\n",
" ('beschlag', 'NOUN'),\n",
" ('allgemein', 'ADJ'),\n",
" ('funktion', 'NOUN'),\n",
" ('schmierung', 'NOUN'),\n",
" ('festhaltevorrichtung', 'NOUN'),\n",
" ('täglich', 'ADJ'),\n",
" ('technikrundgang', 'NOUN'),\n",
" ('monatliche', 'ADJ'),\n",
" ('sichtkontrolle', 'NOUN'),\n",
" ('monatliche', 'ADJ'),\n",
" ('prüfung', 'NOUN'),\n",
" ('scharniere', 'NOUN'),\n",
" ('dichtung', 'NOUN'),\n",
" ('schließvorrichtung', 'NOUN'),\n",
" ('schloß', 'NOUN'),\n",
" ('beschlag', 'NOUN'),\n",
" ('allgemein', 'ADJ'),\n",
" ('funktion', 'NOUN'),\n",
" ('schmierung', 'NOUN'),\n",
" ('festhaltevorrichtung', 'NOUN')]"
]
},
"execution_count": 221,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"important_words"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"43"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(important_words)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_entities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"count = Counter(important_words)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Counter({('täglich', 'ADJ'): 3,\n",
" ('prüfung', 'NOUN'): 3,\n",
" ('sichtkontrolle', 'NOUN'): 2,\n",
" ('kontrolle', 'NOUN'): 2,\n",
" ('scharniere', 'NOUN'): 2,\n",
" ('dichtung', 'NOUN'): 2,\n",
" ('schließvorrichtung', 'NOUN'): 2,\n",
" ('schloß', 'NOUN'): 2,\n",
" ('beschlag', 'NOUN'): 2,\n",
" ('allgemein', 'ADJ'): 2,\n",
" ('funktion', 'NOUN'): 2,\n",
" ('schmierung', 'NOUN'): 2,\n",
" ('festhaltevorrichtung', 'NOUN'): 2,\n",
" ('monatliche', 'ADJ'): 2,\n",
" ('wartungstätigkeit', 'NOUN'): 1,\n",
" ('vorgabe', 'NOUN'): 1,\n",
" ('maschinenhersteller', 'NOUN'): 1,\n",
" ('wöchentliche', 'ADJ'): 1,\n",
" ('reinigung', 'NOUN'): 1,\n",
" ('überprüfung', 'NOUN'): 1,\n",
" ('ölabscheider', 'NOUN'): 1,\n",
" ('wöchentlich', 'ADJ'): 1,\n",
" ('wc-anlage', 'NOUN'): 1,\n",
" ('halbjährliche', 'ADJ'): 1,\n",
" ('stabbreithalter', 'NOUN'): 1,\n",
" ('brandschutztechnische', 'ADJ'): 1,\n",
" ('technikrundgang', 'NOUN'): 1})"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"count"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NOUN 25722\n",
"PUNCT 11626\n",
"VERB 9093\n",
"ADP 7211\n",
"ADV 6526\n",
"PROPN 4481\n",
"NUM 4115\n",
"DET 3845\n",
"ADJ 2576\n",
"AUX 2329\n",
"PART 1561\n",
"CCONJ 1305\n",
"X 999\n",
"PRON 916\n",
"SCONJ 385\n",
"SPACE 236\n",
"INTJ 1\n",
"dtype: int64"
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pos_count = pd.Series(data=pos_counter)\n",
"pos_count.sort_values(ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NOUN 0.310176\n",
"PUNCT 0.140196\n",
"VERB 0.109651\n",
"ADP 0.086956\n",
"ADV 0.078696\n",
"PROPN 0.054035\n",
"NUM 0.049622\n",
"DET 0.046366\n",
"ADJ 0.031063\n",
"AUX 0.028085\n",
"PART 0.018824\n",
"CCONJ 0.015737\n",
"X 0.012047\n",
"PRON 0.011046\n",
"SCONJ 0.004643\n",
"SPACE 0.002846\n",
"INTJ 0.000012\n",
"dtype: float64"
]
},
"execution_count": 184,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pos_count_rel = pos_count / pos_count.sum()\n",
"pos_count_rel.sort_values(ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"82927"
]
},
"execution_count": 181,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"token_counter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Weiterführende Analyse der Beschreibungen\n",
"\n",
"- unklare Zusammenhänge der 1200er-Threshold-Ergebnisse präzisieren:\n",
" - Finden der entsprechenden Beschreibungen\n",
" - Kontextualisieren\n",
"- Identifikation von weiteren Blacklistworten"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unklare Zusammenhänge 1200er-Threshold"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" <tr>\n",
" <th>index</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>161</th>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>66</td>\n",
" <td>92592</td>\n",
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Wöchentliche Sichtkontrolle Reinigung</td>\n",
" <td>37</td>\n",
" <td>1654</td>\n",
" <td>[301, 304, 305, 313, 314, 331, 332, 510, 511, ...</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>130</th>\n",
" <td>Tägliche Überprüfung der Ölabscheider</td>\n",
" <td>37</td>\n",
" <td>1616</td>\n",
" <td>[0, 970, 2134, 2137]</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>Wöchentliche Kontrolle der WC-Anlagen</td>\n",
" <td>37</td>\n",
" <td>1265</td>\n",
" <td>[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139</th>\n",
" <td>Halbjährliche Kontrolle des Stabbreithalters</td>\n",
" <td>44</td>\n",
" <td>687</td>\n",
" <td>[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...</td>\n",
" <td>166</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2675</th>\n",
" <td>Stand 15.07.2020 Stöppel: Herr Langner Toyota ...</td>\n",
" <td>253</td>\n",
" <td>1</td>\n",
" <td>[311]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2674</th>\n",
" <td>Zahnräder der Laufkatze verschlissen Ersatztei...</td>\n",
" <td>167</td>\n",
" <td>1</td>\n",
" <td>[415]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2673</th>\n",
" <td>Bitte 8 Scheiben nach Muster anfertigen. Danke.</td>\n",
" <td>47</td>\n",
" <td>1</td>\n",
" <td>[140]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2672</th>\n",
" <td>Schalter für Bühne Schwenken abgerissen, bitte...</td>\n",
" <td>123</td>\n",
" <td>1</td>\n",
" <td>[323]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6781</th>\n",
" <td>Befestigung Deckel für Batteriefach defekt Hal...</td>\n",
" <td>99</td>\n",
" <td>1</td>\n",
" <td>[326]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6782 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"index \n",
"161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n",
"33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n",
"130 Tägliche Überprüfung der Ölabscheider 37 1616 \n",
"159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n",
"139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n",
"... ... ... ... \n",
"2675 Stand 15.07.2020 Stöppel: Herr Langner Toyota ... 253 1 \n",
"2674 Zahnräder der Laufkatze verschlissen Ersatztei... 167 1 \n",
"2673 Bitte 8 Scheiben nach Muster anfertigen. Danke. 47 1 \n",
"2672 Schalter für Bühne Schwenken abgerissen, bitte... 123 1 \n",
"6781 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"index \n",
"161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
"33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n",
"130 [0, 970, 2134, 2137] 4 \n",
"159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n",
"139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n",
"... ... ... \n",
"2675 [311] 1 \n",
"2674 [415] 1 \n",
"2673 [140] 1 \n",
"2672 [323] 1 \n",
"6781 [326] 1 \n",
"\n",
"[6782 rows x 5 columns]"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"temp2 = temp1.loc[temp1['num_occur'] >= 3, :]\n",
"temp2 = temp1.copy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#temp2 = temp2.iloc[:30,:]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"check_words = set(['E1.8'])\n",
"target_indices = list()\n",
"\n",
"for idx, row in temp2.iterrows():\n",
" \n",
" text = row['descr']\n",
" doc = nlp(text)\n",
" \n",
" token_set = set()\n",
" target_idx = None\n",
" for token in doc:\n",
" \n",
" if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n",
" continue\n",
" \n",
" token_set.add(token.lemma_.lower())\n",
" #print(f'{token_set=}')\n",
"\n",
" if token_set.issuperset(check_words):\n",
" target_indices.append(idx)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"target_indices"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Vorgaben aus Pleva Wartungsplan Schmieren der Rollenlager der beiden Kameralaufschlitten des Strukturdetektors SD 1C siehe Extradaten'"
]
},
"execution_count": 506,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"idx = target_indices[3]\n",
"temp2.at[idx, 'descr']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Leiterprüfung derzeit in Arbeit Abteilungsleiter sind per Email am 11.06.2019 über deren Eigenverantwortlichkeit und Mithilfe durch Herr Graf informiert worden.'"
]
},
"execution_count": 229,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp2.at[1921,'descr']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"token_set.issuperset(check_words)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ADJD'}"
]
},
"execution_count": 180,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"POS_of_interest\n",
"TAG_of_interest"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test = 'Tägliche, tägliche Wartungstätigkeit des Maschinenherstellers Maschine'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"doc = nlp(test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"täglich\n",
"--\n",
"täglich\n",
"wartungstätigkeit\n",
"der\n",
"maschinenhersteller\n",
"maschine\n"
]
}
],
"source": [
"for token in doc:\n",
" print(token.lemma_.lower())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"replace_chars = [',', '\\n', '\\t', '\\s']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test = test.lower()\n",
"for char in replace_chars:\n",
" test = test.replace(char, '')\n",
"test = test.split()\n",
"test = set(test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'des', 'maschine', 'maschinenherstellers', 'tägliche', 'wartungstätigkeit'}"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.issuperset(check_words)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Zwischenergebnisse:**\n",
"\n",
"*bestimmte ObjektIDs haben den Escape-Charakter, andere nicht: keine ObjektID mit beiden Varianten*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl der Duplikate = 47689 für Beschreibung mit Index-Nr. 171:\n",
" Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers\n",
"\n"
]
}
],
"source": [
"print(f\"Anzahl der Duplikate = {max_val} für Beschreibung mit Index-Nr. {index}:\\n {text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# Merkmal 2: VorgangsArtText"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"feature = 'VorgangsArtText'"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"base = wo_duplicates.copy()\n",
"base = base.dropna(axis=0, subset=feature)\n",
"base[feature] = base[feature].map(clean_string)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>11</td>\n",
" <td>114</td>\n",
" <td>427 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-06</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Kettbaum kaputt</td>\n",
" <td>2019-03-06</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>17</td>\n",
" <td>124</td>\n",
" <td>621 C , Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-11</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>asgasdg</td>\n",
" <td>2019-03-11</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Elektrowerkstatt</td>\n",
" <td>Elektrowerkstatt</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>53</td>\n",
" <td>244</td>\n",
" <td>285 C, Webmaschine, SG 220 EMS</td>\n",
" <td>5</td>\n",
" <td>Greifer-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-19</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Kupplung schleift</td>\n",
" <td>NaN</td>\n",
" <td>Kupplung defekt</td>\n",
" <td>2019-03-20</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>NaN</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>NaT</td>\n",
" <td>2019-03-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>58</td>\n",
" <td>257</td>\n",
" <td>107, Webmaschine, OM 220 EOS</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-21</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Gegengewicht wieder anbringen</td>\n",
" <td>NaN</td>\n",
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
" <td>2019-03-21</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schraube ausgebohrt\\nGegengewicht wieder angeb...</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2019-03-21</td>\n",
" <td>2019-03-21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>81</td>\n",
" <td>138</td>\n",
" <td>00138, Schärmaschine 9,</td>\n",
" <td>16</td>\n",
" <td>Schärmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
" <td>NaN</td>\n",
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText \\\n",
"0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n",
"1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n",
"2 53 244 285 C, Webmaschine, SG 220 EMS \n",
"3 58 257 107, Webmaschine, OM 220 EOS \n",
"4 81 138 00138, Schärmaschine 9, \n",
"\n",
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
"3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"0 2019-03-06 4 0 \n",
"1 2019-03-11 5 0 \n",
"2 2019-03-19 5 0 \n",
"3 2019-03-21 5 0 \n",
"4 2019-03-25 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 Kupplung schleift NaN \n",
"3 Gegengewicht wieder anbringen NaN \n",
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"0 Kettbaum kaputt 2019-03-06 \n",
"1 asgasdg 2019-03-11 \n",
"2 Kupplung defekt 2019-03-20 \n",
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
"\n",
" ErledigungsArtText ErledigungsBeschreibung \\\n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 Reparatur UTT NaN \n",
"3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n",
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
"\n",
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"0 Weberei Weberei NaT 2019-03-06 \n",
"1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n",
"2 Weberei Weberei NaT 2019-03-19 \n",
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 "
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base.head()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Einträge: 128936\n"
]
}
],
"source": [
"descriptions = base[feature]\n",
"print(f\"Einträge: {len(descriptions)}\")"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Duplikate VorgangsArtText: 128545\n",
"Anzahl einzigartiger VorgangsArtText: 391\n",
"Anteil einzigartiger VorgangsArtText: 0.30 %\n"
]
}
],
"source": [
"num_dupl_descr = descriptions.duplicated().sum()\n",
"uni_descr = descriptions.unique()\n",
"num_uni_descr = len(uni_descr)\n",
"\n",
"print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n",
"print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n",
"print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"if not LOAD_CALC_FILES:\n",
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
" descr_df = pd.DataFrame(columns=cols)\n",
" max_val = 0\n",
" text = None\n",
" index = 0\n",
"\n",
"\n",
" for idx, description in enumerate(uni_descr):\n",
" len_descr = len(description)\n",
" filt = base[feature] == description\n",
" temp = base[filt]\n",
" assoc_obj_ids = temp['ObjektID'].unique()\n",
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
" num_dupl = filt.sum()\n",
" \n",
" conc_df = pd.DataFrame(data=[[\n",
" description,\n",
" len_descr,\n",
" num_dupl,\n",
" assoc_obj_ids,\n",
" num_assoc_obj_ids\n",
" ]], columns=cols)\n",
" \n",
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
" \n",
" if num_dupl > max_val:\n",
" max_val = num_dupl\n",
" index = idx\n",
" text = description\n",
" \n",
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>44</td>\n",
" <td>92719</td>\n",
" <td>[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...</td>\n",
" <td>206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>01 Interne Reinigung Pflege Überprüfung</td>\n",
" <td>39</td>\n",
" <td>11250</td>\n",
" <td>[0, 7, 425, 426, 427, 428, 429, 517, 518, 576,...</td>\n",
" <td>349</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>02 Interne Reinigung Pflege Überprüfung</td>\n",
" <td>39</td>\n",
" <td>3263</td>\n",
" <td>[576, 906, 910, 940, 941, 942, 943, 1040, 1041...</td>\n",
" <td>52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>Maschinen-Wartung wöchentlich</td>\n",
" <td>29</td>\n",
" <td>2408</td>\n",
" <td>[1, 301, 305, 313, 314, 331, 332, 510, 511, 51...</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Gesetzliche Wartung Prüfung jährlich</td>\n",
" <td>36</td>\n",
" <td>2403</td>\n",
" <td>[0, 191, 193, 195, 197, 200, 287, 288, 289, 29...</td>\n",
" <td>638</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>222</th>\n",
" <td>Walze WK 03 Umlenkwalze zapfen</td>\n",
" <td>30</td>\n",
" <td>1</td>\n",
" <td>[1]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>224</th>\n",
" <td>Leiter Nr. 90 und überprüfen</td>\n",
" <td>28</td>\n",
" <td>1</td>\n",
" <td>[1]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>225</th>\n",
" <td>Locht nicht mehr</td>\n",
" <td>16</td>\n",
" <td>1</td>\n",
" <td>[338]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>226</th>\n",
" <td>Maschine stellt immer wieder ab</td>\n",
" <td>31</td>\n",
" <td>1</td>\n",
" <td>[338]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>390</th>\n",
" <td>Gesetzliche Wartung Prüfung Anlagenprüfung Dru...</td>\n",
" <td>56</td>\n",
" <td>1</td>\n",
" <td>[547]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>391 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"60 Tägliche Interne Wartungstätigkeiten Weberei 44 92719 \n",
"10 01 Interne Reinigung Pflege Überprüfung 39 11250 \n",
"28 02 Interne Reinigung Pflege Überprüfung 39 3263 \n",
"29 Maschinen-Wartung wöchentlich 29 2408 \n",
"46 Gesetzliche Wartung Prüfung jährlich 36 2403 \n",
".. ... .. ... \n",
"222 Walze WK 03 Umlenkwalze zapfen 30 1 \n",
"224 Leiter Nr. 90 und überprüfen 28 1 \n",
"225 Locht nicht mehr 16 1 \n",
"226 Maschine stellt immer wieder ab 31 1 \n",
"390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 56 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"60 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n",
"10 [0, 7, 425, 426, 427, 428, 429, 517, 518, 576,... 349 \n",
"28 [576, 906, 910, 940, 941, 942, 943, 1040, 1041... 52 \n",
"29 [1, 301, 305, 313, 314, 331, 332, 510, 511, 51... 25 \n",
"46 [0, 191, 193, 195, 197, 200, 287, 288, 289, 29... 638 \n",
".. ... ... \n",
"222 [1] 1 \n",
"224 [1] 1 \n",
"225 [338] 1 \n",
"226 [338] 1 \n",
"390 [547] 1 \n",
"\n",
"[391 rows x 5 columns]"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"# save/load dataframe\n",
"FILE_PATH = f'{feature}_analyse_1.fth'\n",
"if LOAD_CALC_FILES:\n",
" temp1 = pd.read_feather(FILE_PATH)\n",
" temp1 = temp1.set_index('index')\n",
"else:\n",
" save_df = temp1.reset_index()\n",
" save_df.to_feather(FILE_PATH)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Gesamter Datensatz"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"# analysiere erste 10 Einträge\n",
"descr = temp1[['descr', 'num_occur']]\n",
"#descr = descr.iloc[50:200,:]"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"391"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(descr)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>num_occur</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>92719</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>01 Interne Reinigung Pflege Überprüfung</td>\n",
" <td>11250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>02 Interne Reinigung Pflege Überprüfung</td>\n",
" <td>3263</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>Maschinen-Wartung wöchentlich</td>\n",
" <td>2408</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Gesetzliche Wartung Prüfung jährlich</td>\n",
" <td>2403</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>222</th>\n",
" <td>Walze WK 03 Umlenkwalze zapfen</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>224</th>\n",
" <td>Leiter Nr. 90 und überprüfen</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>225</th>\n",
" <td>Locht nicht mehr</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>226</th>\n",
" <td>Maschine stellt immer wieder ab</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>390</th>\n",
" <td>Gesetzliche Wartung Prüfung Anlagenprüfung Dru...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>391 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" descr num_occur\n",
"60 Tägliche Interne Wartungstätigkeiten Weberei 92719\n",
"10 01 Interne Reinigung Pflege Überprüfung 11250\n",
"28 02 Interne Reinigung Pflege Überprüfung 3263\n",
"29 Maschinen-Wartung wöchentlich 2408\n",
"46 Gesetzliche Wartung Prüfung jährlich 2403\n",
".. ... ...\n",
"222 Walze WK 03 Umlenkwalze zapfen 1\n",
"224 Leiter Nr. 90 und überprüfen 1\n",
"225 Locht nicht mehr 1\n",
"226 Maschine stellt immer wieder ab 1\n",
"390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 1\n",
"\n",
"[391 rows x 2 columns]"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descr"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"#LOAD_CALC_FILES = True\n",
"#LOAD_CALC_FILES = False\n",
"#IS_TEST = True\n",
"IS_TEST = False"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:base:Number of entries processed: 1, Percent completed: 0.26\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:base:Number of entries processed: 101, Percent completed: 25.83\n",
"INFO:base:Number of entries processed: 201, Percent completed: 51.41\n",
"INFO:base:Number of entries processed: 301, Percent completed: 76.98\n"
]
}
],
"source": [
"# adjacency matrix\n",
"connections = dict()\n",
"unique_tokens = set()\n",
"UPDATE_STATUS = 100\n",
"length_data = len(descr)\n",
"spell_check_candidates = set()\n",
"spell_checker = SpellChecker(language='de', distance=1)\n",
"\n",
"if not LOAD_CALC_FILES or IS_TEST:\n",
" for count, description in enumerate(descr.iterrows()):\n",
" \n",
" text = description[1]['descr']\n",
" weight = description[1]['num_occur']\n",
" \n",
" doc = nlp(text)\n",
" \n",
" obtain_descendant_info(\n",
" doc=doc,\n",
" weight=weight,\n",
" POS_of_interest=POS_of_interest,\n",
" TAG_of_interest=TAG_of_interest,\n",
" connections=connections,\n",
" unique_tokens=unique_tokens,\n",
" spell_check_candidates=spell_check_candidates,\n",
" spell_check_whitelist=spell_check_whitelist,\n",
" spell_checker=spell_checker,\n",
" corrections=corrections,\n",
" )\n",
" \n",
" if count % UPDATE_STATUS == 0:\n",
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n",
"if not IS_TEST:\n",
" if LOAD_CALC_FILES:\n",
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
" # additional information\n",
" connections = load_pickle('connections.pkl')\n",
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
" else:\n",
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
" save_df = adj_mat_undir.reset_index()\n",
" save_df.to_feather(ADJ_DF_PATH)\n",
" # additional information\n",
" save_pickle(obj=connections, path='connections.pkl')\n",
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>lecken</th>\n",
" <th>WC</th>\n",
" <th>LKW</th>\n",
" <th>offen</th>\n",
" <th>Maschinen-Reinigung</th>\n",
" <th>Dockenwickler</th>\n",
" <th>halb-jährlich</th>\n",
" <th>Tisch</th>\n",
" <th>zentral</th>\n",
" <th>anbringen</th>\n",
" <th>...</th>\n",
" <th>undicht-</th>\n",
" <th>Platine</th>\n",
" <th>erneuern</th>\n",
" <th>Verschmutzung</th>\n",
" <th>befestigen</th>\n",
" <th>wechseln</th>\n",
" <th>Labor</th>\n",
" <th>Walze</th>\n",
" <th>anfahren</th>\n",
" <th>Leiter</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>12-monatige-Inspektion</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2-monatlich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2-wöchentlich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24-monatige-Inspektion</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3-jährlich</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ölwechsel</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Überprüfung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>äußerer</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überprüfen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überziehen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>390 rows × 390 columns</p>\n",
"</div>"
],
"text/plain": [
" lecken WC LKW offen Maschinen-Reinigung \\\n",
"12-monatige-Inspektion 0 0 0 0 0 \n",
"2-monatlich 0 0 0 0 0 \n",
"2-wöchentlich 0 0 0 0 0 \n",
"24-monatige-Inspektion 0 0 0 0 0 \n",
"3-jährlich 0 0 0 0 0 \n",
"... ... .. ... ... ... \n",
"Ölwechsel 0 0 0 0 0 \n",
"Überprüfung 0 0 0 0 0 \n",
"äußerer 0 0 0 0 0 \n",
"überprüfen 0 0 0 0 0 \n",
"überziehen 0 0 0 0 0 \n",
"\n",
" Dockenwickler halb-jährlich Tisch zentral \\\n",
"12-monatige-Inspektion 0 0 0 0 \n",
"2-monatlich 0 0 0 0 \n",
"2-wöchentlich 0 0 0 0 \n",
"24-monatige-Inspektion 0 0 0 0 \n",
"3-jährlich 0 0 0 0 \n",
"... ... ... ... ... \n",
"Ölwechsel 0 0 0 0 \n",
"Überprüfung 0 0 0 0 \n",
"äußerer 0 0 0 0 \n",
"überprüfen 0 0 0 0 \n",
"überziehen 0 0 0 0 \n",
"\n",
" anbringen ... undicht- Platine erneuern \\\n",
"12-monatige-Inspektion 0 ... 0 0 0 \n",
"2-monatlich 0 ... 0 0 0 \n",
"2-wöchentlich 0 ... 0 0 0 \n",
"24-monatige-Inspektion 0 ... 0 0 0 \n",
"3-jährlich 0 ... 0 0 0 \n",
"... ... ... ... ... ... \n",
"Ölwechsel 0 ... 0 0 0 \n",
"Überprüfung 0 ... 0 0 0 \n",
"äußerer 0 ... 0 0 0 \n",
"überprüfen 0 ... 0 0 0 \n",
"überziehen 0 ... 0 0 0 \n",
"\n",
" Verschmutzung befestigen wechseln Labor Walze \\\n",
"12-monatige-Inspektion 0 0 0 0 0 \n",
"2-monatlich 0 0 0 0 0 \n",
"2-wöchentlich 0 0 0 0 0 \n",
"24-monatige-Inspektion 0 0 0 0 0 \n",
"3-jährlich 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"Ölwechsel 0 0 0 0 0 \n",
"Überprüfung 0 0 0 0 0 \n",
"äußerer 0 0 0 0 0 \n",
"überprüfen 0 0 0 0 0 \n",
"überziehen 0 0 0 0 1 \n",
"\n",
" anfahren Leiter \n",
"12-monatige-Inspektion 0 0 \n",
"2-monatlich 0 0 \n",
"2-wöchentlich 0 0 \n",
"24-monatige-Inspektion 0 0 \n",
"3-jährlich 0 0 \n",
"... ... ... \n",
"Ölwechsel 0 0 \n",
"Überprüfung 0 0 \n",
"äußerer 0 0 \n",
"überprüfen 0 1 \n",
"überziehen 0 0 \n",
"\n",
"[390 rows x 390 columns]"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_mat_undir.sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"391"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"92964"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.max(arr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Threshold"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [],
"source": [
"WEIGHT_THRESHOLD = 0"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [],
"source": [
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"391"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"233"
]
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp = np.sum(arr, axis=0)\n",
"np.count_nonzero(temp)"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {},
"outputs": [],
"source": [
"thresh_adj_mat = adj_mat_undir.copy()\n",
"thresh_adj_mat.loc[:] = arr"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Wasserleitung</th>\n",
" <th>wechseln</th>\n",
" <th>Winkelpositionsgeber</th>\n",
" <th>Klimaanlagengerät</th>\n",
" <th>versetzen</th>\n",
" <th>Brennschlitten</th>\n",
" <th>feststellen</th>\n",
" <th>Stuhl</th>\n",
" <th>monatlich</th>\n",
" <th>anfertigen</th>\n",
" <th>...</th>\n",
" <th>Zahnriemen</th>\n",
" <th>Rampe</th>\n",
" <th>Tisch</th>\n",
" <th>defekt</th>\n",
" <th>Elektrische</th>\n",
" <th>haben</th>\n",
" <th>Wasserenthärtungsanlage</th>\n",
" <th>Gestank</th>\n",
" <th>Zahnrad</th>\n",
" <th>hydraulisch</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Wasserleitung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>wechseln</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Winkelpositionsgeber</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Klimaanlagengerät</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>versetzen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>haben</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Wasserenthärtungsanlage</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Gestank</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Zahnrad</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>hydraulisch</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>390 rows × 390 columns</p>\n",
"</div>"
],
"text/plain": [
" Wasserleitung wechseln Winkelpositionsgeber \\\n",
"Wasserleitung 0 0 0 \n",
"wechseln 0 0 0 \n",
"Winkelpositionsgeber 0 0 0 \n",
"Klimaanlagengerät 0 0 0 \n",
"versetzen 0 0 0 \n",
"... ... ... ... \n",
"haben 0 0 0 \n",
"Wasserenthärtungsanlage 0 0 0 \n",
"Gestank 0 0 0 \n",
"Zahnrad 0 0 0 \n",
"hydraulisch 0 0 0 \n",
"\n",
" Klimaanlagengerät versetzen Brennschlitten \\\n",
"Wasserleitung 0 0 0 \n",
"wechseln 0 0 0 \n",
"Winkelpositionsgeber 0 0 0 \n",
"Klimaanlagengerät 0 0 0 \n",
"versetzen 0 0 0 \n",
"... ... ... ... \n",
"haben 0 0 0 \n",
"Wasserenthärtungsanlage 0 0 0 \n",
"Gestank 0 0 0 \n",
"Zahnrad 0 0 0 \n",
"hydraulisch 0 0 0 \n",
"\n",
" feststellen Stuhl monatlich anfertigen ... \\\n",
"Wasserleitung 0 0 0 0 ... \n",
"wechseln 0 0 0 0 ... \n",
"Winkelpositionsgeber 0 0 0 0 ... \n",
"Klimaanlagengerät 0 0 0 0 ... \n",
"versetzen 0 0 0 0 ... \n",
"... ... ... ... ... ... \n",
"haben 0 0 0 0 ... \n",
"Wasserenthärtungsanlage 0 0 0 0 ... \n",
"Gestank 0 0 0 0 ... \n",
"Zahnrad 0 0 0 0 ... \n",
"hydraulisch 0 0 0 0 ... \n",
"\n",
" Zahnriemen Rampe Tisch defekt Elektrische haben \\\n",
"Wasserleitung 0 0 0 0 0 0 \n",
"wechseln 0 0 0 0 0 0 \n",
"Winkelpositionsgeber 0 0 0 1 0 0 \n",
"Klimaanlagengerät 0 0 0 0 0 0 \n",
"versetzen 0 0 0 0 0 0 \n",
"... ... ... ... ... ... ... \n",
"haben 0 0 0 0 0 0 \n",
"Wasserenthärtungsanlage 0 0 0 0 0 0 \n",
"Gestank 0 0 0 0 0 0 \n",
"Zahnrad 0 0 0 0 0 0 \n",
"hydraulisch 0 0 0 0 0 0 \n",
"\n",
" Wasserenthärtungsanlage Gestank Zahnrad \\\n",
"Wasserleitung 0 0 0 \n",
"wechseln 0 0 0 \n",
"Winkelpositionsgeber 0 0 0 \n",
"Klimaanlagengerät 0 0 0 \n",
"versetzen 0 0 0 \n",
"... ... ... ... \n",
"haben 0 0 0 \n",
"Wasserenthärtungsanlage 0 0 0 \n",
"Gestank 0 0 0 \n",
"Zahnrad 0 0 0 \n",
"hydraulisch 0 0 0 \n",
"\n",
" hydraulisch \n",
"Wasserleitung 0 \n",
"wechseln 0 \n",
"Winkelpositionsgeber 0 \n",
"Klimaanlagengerät 0 \n",
"versetzen 0 \n",
"... ... \n",
"haben 0 \n",
"Wasserenthärtungsanlage 0 \n",
"Gestank 0 \n",
"Zahnrad 0 \n",
"hydraulisch 0 \n",
"\n",
"[390 rows x 390 columns]"
]
},
"execution_count": 168,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"thresh_adj_mat"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {},
"outputs": [],
"source": [
"ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n",
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"# Merkmal 3: ErledigungsBeschreibung"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"feature = 'ErledigungsBeschreibung'"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"base = wo_duplicates.copy()\n",
"base = base.dropna(axis=0, subset=feature)\n",
"base[feature] = base[feature].map(clean_string)"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>58</td>\n",
" <td>257</td>\n",
" <td>107, Webmaschine, OM 220 EOS</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-21</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Gegengewicht wieder anbringen</td>\n",
" <td>NaN</td>\n",
" <td>Gegengewicht an der Webmaschine abgefallen</td>\n",
" <td>2019-03-21</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schraube ausgebohrt Gegengewicht wieder angebr...</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2019-03-21</td>\n",
" <td>2019-03-21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>81</td>\n",
" <td>138</td>\n",
" <td>00138, Schärmaschine 9,</td>\n",
" <td>16</td>\n",
" <td>Schärmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>da ist etwas gebrochen. (Herr Heininger)</td>\n",
" <td>NaN</td>\n",
" <td>zentrale Bremsenverstellung linke Gatterseite ...</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Bolzen gebrochen. Bolzen neu angefertig und di...</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>82</td>\n",
" <td>0</td>\n",
" <td>Warenschau allgemein</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-25</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Klappbügel Portalkran H31 defekt</td>\n",
" <td>Warenschau allgemein</td>\n",
" <td>Allgemeine Reparaturarbeiten</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Feder ausgetauscht</td>\n",
" <td>Warenschau</td>\n",
" <td>Warenschau</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>76</td>\n",
" <td>0</td>\n",
" <td>Neben der Türe</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-03-22</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Schraube nix mer gut</td>\n",
" <td>Neben der Türe</td>\n",
" <td>Kettbaum</td>\n",
" <td>2019-03-25</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>Schrauben ausgebohrt Gewinde nachgeschnitten</td>\n",
" <td>Vorwerk</td>\n",
" <td>Vorwerk</td>\n",
" <td>2019-03-25</td>\n",
" <td>2019-03-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>111</td>\n",
" <td>241</td>\n",
" <td>294 C, Webmaschine, SG 240 EMS</td>\n",
" <td>5</td>\n",
" <td>Greifer-Webmaschine</td>\n",
" <td>3</td>\n",
" <td>Reparaturauftrag (Portal)</td>\n",
" <td>2019-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>KBK tauschen\\nUrsache vermutlich mechanisch</td>\n",
" <td>NaN</td>\n",
" <td>Kupplung-Brems-Kombination</td>\n",
" <td>2019-04-08</td>\n",
" <td>Reparatur UTT</td>\n",
" <td>da derzeit Keine Ersatzteile da Reparatur mit ...</td>\n",
" <td>Weberei</td>\n",
" <td>Weberei</td>\n",
" <td>2019-04-02</td>\n",
" <td>2019-04-01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
"3 58 257 107, Webmaschine, OM 220 EOS 3 \n",
"4 81 138 00138, Schärmaschine 9, 16 \n",
"5 82 0 Warenschau allgemein 0 \n",
"6 76 0 Neben der Türe 0 \n",
"8 111 241 294 C, Webmaschine, SG 240 EMS 5 \n",
"\n",
" ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n",
"4 Schärmaschine 3 Reparaturauftrag (Portal) \n",
"5 NaN 3 Reparaturauftrag (Portal) \n",
"6 NaN 3 Reparaturauftrag (Portal) \n",
"8 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"3 2019-03-21 5 0 \n",
"4 2019-03-25 5 0 \n",
"5 2019-03-25 5 0 \n",
"6 2019-03-22 5 0 \n",
"8 2019-04-01 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"3 Gegengewicht wieder anbringen NaN \n",
"4 da ist etwas gebrochen. (Herr Heininger) NaN \n",
"5 Klappbügel Portalkran H31 defekt Warenschau allgemein \n",
"6 Schraube nix mer gut Neben der Türe \n",
"8 KBK tauschen\\nUrsache vermutlich mechanisch NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n",
"4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n",
"5 Allgemeine Reparaturarbeiten 2019-03-25 \n",
"6 Kettbaum 2019-03-25 \n",
"8 Kupplung-Brems-Kombination 2019-04-08 \n",
"\n",
" ErledigungsArtText ErledigungsBeschreibung \\\n",
"3 Reparatur UTT Schraube ausgebohrt Gegengewicht wieder angebr... \n",
"4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n",
"5 Reparatur UTT Feder ausgetauscht \n",
"6 Reparatur UTT Schrauben ausgebohrt Gewinde nachgeschnitten \n",
"8 Reparatur UTT da derzeit Keine Ersatzteile da Reparatur mit ... \n",
"\n",
" MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"3 Weberei Weberei 2019-03-21 2019-03-21 \n",
"4 Vorwerk Vorwerk 2019-03-25 2019-03-25 \n",
"5 Warenschau Warenschau 2019-03-25 2019-03-25 \n",
"6 Vorwerk Vorwerk 2019-03-25 2019-03-22 \n",
"8 Weberei Weberei 2019-04-02 2019-04-01 "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base.head()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Einträge: 118086\n"
]
}
],
"source": [
"descriptions = base[feature]\n",
"print(f\"Einträge: {len(descriptions)}\")"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Duplikate ErledigungsBeschreibung: 110707\n",
"Anzahl einzigartiger ErledigungsBeschreibung: 7379\n",
"Anteil einzigartiger ErledigungsBeschreibung: 6.25 %\n"
]
}
],
"source": [
"num_dupl_descr = descriptions.duplicated().sum()\n",
"uni_descr = descriptions.unique()\n",
"num_uni_descr = len(uni_descr)\n",
"\n",
"print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n",
"print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n",
"print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"LOAD_CALC_FILES"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"if not LOAD_CALC_FILES:\n",
" cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n",
" descr_df = pd.DataFrame(columns=cols)\n",
" max_val = 0\n",
" text = None\n",
" index = 0\n",
"\n",
"\n",
" for idx, description in enumerate(uni_descr):\n",
" len_descr = len(description)\n",
" filt = base[feature] == description\n",
" temp = base[filt]\n",
" assoc_obj_ids = temp['ObjektID'].unique()\n",
" assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n",
" num_assoc_obj_ids = len(assoc_obj_ids)\n",
" num_dupl = filt.sum()\n",
" \n",
" conc_df = pd.DataFrame(data=[[\n",
" description,\n",
" len_descr,\n",
" num_dupl,\n",
" assoc_obj_ids,\n",
" num_assoc_obj_ids\n",
" ]], columns=cols)\n",
" \n",
" descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n",
" \n",
" if num_dupl > max_val:\n",
" max_val = num_dupl\n",
" index = idx\n",
" text = description\n",
" \n",
" temp1 = descr_df.sort_values(by='num_occur', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>len</th>\n",
" <th>num_occur</th>\n",
" <th>assoc_obj_ids</th>\n",
" <th>num_assoc_obj_ids</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
" <td>95</td>\n",
" <td>98720</td>\n",
" <td>[0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ...</td>\n",
" <td>953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
" <td>100</td>\n",
" <td>1450</td>\n",
" <td>[0, 1, 140, 301, 305, 313, 314, 576, 970, 1110...</td>\n",
" <td>28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>Externe Prüfung wurde durchgeführt Beanstandun...</td>\n",
" <td>119</td>\n",
" <td>1082</td>\n",
" <td>[191, 193, 195, 197, 200, 264, 287, 288, 289, ...</td>\n",
" <td>413</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <td>Reinigung durchgeführt Auffälligkeiten festges...</td>\n",
" <td>90</td>\n",
" <td>762</td>\n",
" <td>[0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3...</td>\n",
" <td>90</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>Sichtkontrolle wie festgelegt durchgeführt Auf...</td>\n",
" <td>110</td>\n",
" <td>648</td>\n",
" <td>[1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66...</td>\n",
" <td>271</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2805</th>\n",
" <td>X Achse Süd Führungswägen Kurze Version eingebaut</td>\n",
" <td>49</td>\n",
" <td>1</td>\n",
" <td>[21]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2804</th>\n",
" <td>Maschinenrahmen ausgerichtet und ausgebeult. M...</td>\n",
" <td>90</td>\n",
" <td>1</td>\n",
" <td>[144]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2803</th>\n",
" <td>Bügel und Stützräder getauscht</td>\n",
" <td>30</td>\n",
" <td>1</td>\n",
" <td>[315]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2802</th>\n",
" <td>Graf: TK wurde in Arbeitsauftrag 65487 gewandelt</td>\n",
" <td>48</td>\n",
" <td>1</td>\n",
" <td>[405]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7378</th>\n",
" <td>Neue Gasfeder eingebaut</td>\n",
" <td>23</td>\n",
" <td>1</td>\n",
" <td>[326]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7379 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" descr len num_occur \\\n",
"112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 95 98720 \n",
"108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 100 1450 \n",
"147 Externe Prüfung wurde durchgeführt Beanstandun... 119 1082 \n",
"128 Reinigung durchgeführt Auffälligkeiten festges... 90 762 \n",
"96 Sichtkontrolle wie festgelegt durchgeführt Auf... 110 648 \n",
"... ... ... ... \n",
"2805 X Achse Süd Führungswägen Kurze Version eingebaut 49 1 \n",
"2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 90 1 \n",
"2803 Bügel und Stützräder getauscht 30 1 \n",
"2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 48 1 \n",
"7378 Neue Gasfeder eingebaut 23 1 \n",
"\n",
" assoc_obj_ids num_assoc_obj_ids \n",
"112 [0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ... 953 \n",
"108 [0, 1, 140, 301, 305, 313, 314, 576, 970, 1110... 28 \n",
"147 [191, 193, 195, 197, 200, 264, 287, 288, 289, ... 413 \n",
"128 [0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3... 90 \n",
"96 [1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66... 271 \n",
"... ... ... \n",
"2805 [21] 1 \n",
"2804 [144] 1 \n",
"2803 [315] 1 \n",
"2802 [405] 1 \n",
"7378 [326] 1 \n",
"\n",
"[7379 rows x 5 columns]"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben:'"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1.iat[0,0]"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben: Nein'"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp1.iat[1,0]"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [],
"source": [
"# save/load dataframe\n",
"FILE_PATH = f'{feature}_analyse_1.fth'\n",
"if LOAD_CALC_FILES:\n",
" temp1 = pd.read_feather(FILE_PATH)\n",
" temp1 = temp1.set_index('index')\n",
"else:\n",
" save_df = temp1.reset_index()\n",
" save_df.to_feather(FILE_PATH)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Gesamter Datensatz"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"# analysiere erste 10 Einträge\n",
"descr = temp1[['descr', 'num_occur']]\n",
"#descr = descr.iloc[50:200,:]"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [],
"source": [
"#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7379"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(descr)"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>descr</th>\n",
" <th>num_occur</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
" <td>98720</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>Sichtkontrolle durchgeführt Auffälligkeiten fe...</td>\n",
" <td>1450</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>Externe Prüfung wurde durchgeführt Beanstandun...</td>\n",
" <td>1082</td>\n",
" </tr>\n",
" <tr>\n",
" <th>128</th>\n",
" <td>Reinigung durchgeführt Auffälligkeiten festges...</td>\n",
" <td>762</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>Sichtkontrolle wie festgelegt durchgeführt Auf...</td>\n",
" <td>648</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2805</th>\n",
" <td>X Achse Süd Führungswägen Kurze Version eingebaut</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2804</th>\n",
" <td>Maschinenrahmen ausgerichtet und ausgebeult. M...</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2803</th>\n",
" <td>Bügel und Stützräder getauscht</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2802</th>\n",
" <td>Graf: TK wurde in Arbeitsauftrag 65487 gewandelt</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7378</th>\n",
" <td>Neue Gasfeder eingebaut</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7379 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" descr num_occur\n",
"112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 98720\n",
"108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 1450\n",
"147 Externe Prüfung wurde durchgeführt Beanstandun... 1082\n",
"128 Reinigung durchgeführt Auffälligkeiten festges... 762\n",
"96 Sichtkontrolle wie festgelegt durchgeführt Auf... 648\n",
"... ... ...\n",
"2805 X Achse Süd Führungswägen Kurze Version eingebaut 1\n",
"2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 1\n",
"2803 Bügel und Stützräder getauscht 1\n",
"2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 1\n",
"7378 Neue Gasfeder eingebaut 1\n",
"\n",
"[7379 rows x 2 columns]"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descr"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [],
"source": [
"#LOAD_CALC_FILES = True\n",
"#LOAD_CALC_FILES = False\n",
"#IS_TEST = True\n",
"IS_TEST = False"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:base:Number of entries processed: 1, Percent completed: 0.01\n",
"INFO:base:Number of entries processed: 501, Percent completed: 6.79\n",
"INFO:base:Number of entries processed: 1001, Percent completed: 13.57\n",
"INFO:base:Number of entries processed: 1501, Percent completed: 20.34\n",
"INFO:base:Number of entries processed: 2001, Percent completed: 27.12\n",
"INFO:base:Number of entries processed: 2501, Percent completed: 33.89\n",
"INFO:base:Number of entries processed: 3001, Percent completed: 40.67\n",
"INFO:base:Number of entries processed: 3501, Percent completed: 47.45\n",
"INFO:base:Number of entries processed: 4001, Percent completed: 54.22\n",
"INFO:base:Number of entries processed: 4501, Percent completed: 61.00\n",
"INFO:base:Number of entries processed: 5001, Percent completed: 67.77\n",
"INFO:base:Number of entries processed: 5501, Percent completed: 74.55\n",
"INFO:base:Number of entries processed: 6001, Percent completed: 81.33\n",
"INFO:base:Number of entries processed: 6501, Percent completed: 88.10\n",
"INFO:base:Number of entries processed: 7001, Percent completed: 94.88\n"
]
}
],
"source": [
"# adjacency matrix\n",
"connections = dict()\n",
"unique_tokens = set()\n",
"UPDATE_STATUS = 500\n",
"length_data = len(descr)\n",
"spell_check_candidates = set()\n",
"spell_checker = SpellChecker(language='de', distance=1)\n",
"\n",
"if not LOAD_CALC_FILES or IS_TEST:\n",
" for count, description in enumerate(descr.iterrows()):\n",
" \n",
" text = description[1]['descr']\n",
" weight = description[1]['num_occur']\n",
" \n",
" doc = nlp(text)\n",
" \n",
" obtain_descendant_info(\n",
" doc=doc,\n",
" weight=weight,\n",
" POS_of_interest=POS_of_interest,\n",
" TAG_of_interest=TAG_of_interest,\n",
" connections=connections,\n",
" unique_tokens=unique_tokens,\n",
" spell_check_candidates=spell_check_candidates,\n",
" spell_check_whitelist=spell_check_whitelist,\n",
" spell_checker=spell_checker,\n",
" corrections=corrections,\n",
" )\n",
" \n",
" if count % UPDATE_STATUS == 0:\n",
" logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [],
"source": [
"ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n",
"if not IS_TEST:\n",
" if LOAD_CALC_FILES:\n",
" adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n",
" adj_mat_undir = adj_mat_undir.set_index('index')\n",
" # additional information\n",
" connections = load_pickle('connections.pkl')\n",
" unique_tokens = load_pickle('unique_tokens.pkl')\n",
" else:\n",
" adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n",
" adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n",
" save_df = adj_mat_undir.reset_index()\n",
" save_df.to_feather(ADJ_DF_PATH)\n",
" # additional information\n",
" save_pickle(obj=connections, path='connections.pkl')\n",
" save_pickle(obj=unique_tokens, path='unique_tokens.pkl')"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>funktionsfähig</th>\n",
" <th>Zwischenbehälter</th>\n",
" <th>Ölfilter</th>\n",
" <th>Rechter</th>\n",
" <th>Kontaktproblem</th>\n",
" <th>Geschweisst</th>\n",
" <th>vorbereiten</th>\n",
" <th>Gelenkbolzen</th>\n",
" <th>Silikonfass</th>\n",
" <th>Ausbau</th>\n",
" <th>...</th>\n",
" <th>Kom</th>\n",
" <th>anlernen</th>\n",
" <th>nah</th>\n",
" <th>Begutachtung</th>\n",
" <th>Betriebszeit</th>\n",
" <th>paletten</th>\n",
" <th>augetreten</th>\n",
" <th>Antriebszahnrad</th>\n",
" <th>Gewindereparaturset</th>\n",
" <th>Heizventil</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>-20C</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Befestihgung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Einlaufwalze</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Entlüftungssicherung</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>-Faltbalken</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überzogenn</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>überzoggen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>übrtprüfen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ünerziehen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>üperprüfen</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6946 rows × 6946 columns</p>\n",
"</div>"
],
"text/plain": [
" funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n",
"-20C 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Einlaufwalze 0 0 0 0 \n",
"-Entlüftungssicherung 0 0 0 0 \n",
"-Faltbalken 0 0 0 0 \n",
"... ... ... ... ... \n",
"überzogenn 0 0 0 0 \n",
"überzoggen 0 0 0 0 \n",
"übrtprüfen 0 0 0 0 \n",
"ünerziehen 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n",
"-20C 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Einlaufwalze 0 0 0 0 \n",
"-Entlüftungssicherung 0 0 0 0 \n",
"-Faltbalken 0 0 0 0 \n",
"... ... ... ... ... \n",
"überzogenn 0 0 0 0 \n",
"überzoggen 0 0 0 0 \n",
"übrtprüfen 0 0 0 0 \n",
"ünerziehen 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" Silikonfass Ausbau ... Kom anlernen nah \\\n",
"-20C 0 0 ... 0 0 0 \n",
"-Befestihgung 0 0 ... 0 0 0 \n",
"-Einlaufwalze 0 0 ... 0 0 0 \n",
"-Entlüftungssicherung 0 0 ... 0 0 0 \n",
"-Faltbalken 0 0 ... 0 0 0 \n",
"... ... ... ... ... ... ... \n",
"überzogenn 0 0 ... 0 0 0 \n",
"überzoggen 0 0 ... 0 0 0 \n",
"übrtprüfen 0 0 ... 0 0 0 \n",
"ünerziehen 0 0 ... 0 0 0 \n",
"üperprüfen 0 0 ... 0 0 0 \n",
"\n",
" Begutachtung Betriebszeit paletten augetreten \\\n",
"-20C 0 0 0 0 \n",
"-Befestihgung 0 0 0 0 \n",
"-Einlaufwalze 0 0 0 0 \n",
"-Entlüftungssicherung 0 0 0 0 \n",
"-Faltbalken 0 0 0 0 \n",
"... ... ... ... ... \n",
"überzogenn 0 0 0 0 \n",
"überzoggen 0 0 0 0 \n",
"übrtprüfen 0 0 0 0 \n",
"ünerziehen 0 0 0 0 \n",
"üperprüfen 0 0 0 0 \n",
"\n",
" Antriebszahnrad Gewindereparaturset Heizventil \n",
"-20C 0 0 0 \n",
"-Befestihgung 0 0 0 \n",
"-Einlaufwalze 0 0 0 \n",
"-Entlüftungssicherung 0 0 0 \n",
"-Faltbalken 0 0 0 \n",
"... ... ... ... \n",
"überzogenn 0 0 0 \n",
"überzoggen 0 0 0 \n",
"übrtprüfen 0 0 0 \n",
"ünerziehen 0 0 0 \n",
"üperprüfen 0 0 0 \n",
"\n",
"[6946 rows x 6946 columns]"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"adj_mat_undir.sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"24171"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"103601"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.max(arr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Threshold"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"WEIGHT_THRESHOLD = 30"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [],
"source": [
"arr = adj_mat_undir.to_numpy()"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [],
"source": [
"arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"138"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.count_nonzero(arr)"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"thresh_adj_mat = adj_mat_undir.copy()\n",
"thresh_adj_mat.loc[:] = arr"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>funktionsfähig</th>\n",
" <th>Zwischenbehälter</th>\n",
" <th>Ölfilter</th>\n",
" <th>Rechter</th>\n",
" <th>Kontaktproblem</th>\n",
" <th>Geschweisst</th>\n",
" <th>vorbereiten</th>\n",
" <th>Gelenkbolzen</th>\n",
" <th>Silikonfass</th>\n",
" <th>Ausbau</th>\n",
" <th>...</th>\n",
" <th>Kom</th>\n",
" <th>anlernen</th>\n",
" <th>nah</th>\n",
" <th>Begutachtung</th>\n",
" <th>Betriebszeit</th>\n",
" <th>paletten</th>\n",
" <th>augetreten</th>\n",
" <th>Antriebszahnrad</th>\n",
" <th>Gewindereparaturset</th>\n",
" <th>Heizventil</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>funktionsfähig</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Zwischenbehälter</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ölfilter</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Rechter</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Kontaktproblem</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>paletten</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>augetreten</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Antriebszahnrad</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Gewindereparaturset</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Heizventil</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6946 rows × 6946 columns</p>\n",
"</div>"
],
"text/plain": [
" funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n",
"funktionsfähig 0 0 0 0 \n",
"Zwischenbehälter 0 0 0 0 \n",
"Ölfilter 0 0 0 0 \n",
"Rechter 0 0 0 0 \n",
"Kontaktproblem 0 0 0 0 \n",
"... ... ... ... ... \n",
"paletten 0 0 0 0 \n",
"augetreten 0 0 0 0 \n",
"Antriebszahnrad 0 0 0 0 \n",
"Gewindereparaturset 0 0 0 0 \n",
"Heizventil 0 0 0 0 \n",
"\n",
" Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n",
"funktionsfähig 0 0 0 0 \n",
"Zwischenbehälter 0 0 0 0 \n",
"Ölfilter 0 0 0 0 \n",
"Rechter 0 0 0 0 \n",
"Kontaktproblem 0 0 0 0 \n",
"... ... ... ... ... \n",
"paletten 0 0 0 0 \n",
"augetreten 0 0 0 0 \n",
"Antriebszahnrad 0 0 0 0 \n",
"Gewindereparaturset 0 0 0 0 \n",
"Heizventil 0 0 0 0 \n",
"\n",
" Silikonfass Ausbau ... Kom anlernen nah \\\n",
"funktionsfähig 0 0 ... 0 0 0 \n",
"Zwischenbehälter 0 0 ... 0 0 0 \n",
"Ölfilter 0 0 ... 0 0 0 \n",
"Rechter 0 0 ... 0 0 0 \n",
"Kontaktproblem 0 0 ... 0 0 0 \n",
"... ... ... ... ... ... ... \n",
"paletten 0 0 ... 0 0 0 \n",
"augetreten 0 0 ... 0 0 0 \n",
"Antriebszahnrad 0 0 ... 0 0 0 \n",
"Gewindereparaturset 0 0 ... 0 0 0 \n",
"Heizventil 0 0 ... 0 0 0 \n",
"\n",
" Begutachtung Betriebszeit paletten augetreten \\\n",
"funktionsfähig 0 0 0 0 \n",
"Zwischenbehälter 0 0 0 0 \n",
"Ölfilter 0 0 0 0 \n",
"Rechter 0 0 0 0 \n",
"Kontaktproblem 0 0 0 0 \n",
"... ... ... ... ... \n",
"paletten 0 0 0 0 \n",
"augetreten 0 0 0 0 \n",
"Antriebszahnrad 0 0 0 0 \n",
"Gewindereparaturset 0 0 0 0 \n",
"Heizventil 0 0 0 0 \n",
"\n",
" Antriebszahnrad Gewindereparaturset Heizventil \n",
"funktionsfähig 0 0 0 \n",
"Zwischenbehälter 0 0 0 \n",
"Ölfilter 0 0 0 \n",
"Rechter 0 0 0 \n",
"Kontaktproblem 0 0 0 \n",
"... ... ... ... \n",
"paletten 0 0 0 \n",
"augetreten 0 0 0 \n",
"Antriebszahnrad 0 0 0 \n",
"Gewindereparaturset 0 0 0 \n",
"Heizventil 0 0 0 \n",
"\n",
"[6946 rows x 6946 columns]"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"thresh_adj_mat"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [],
"source": [
"ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n",
"thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"# **Zusatz**\n",
"\n",
"#### **Analysiere beispielhaft Eintrag mit meisten Duplikaten**"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Einträge mit gewählter Beschreibung: 47689\n"
]
}
],
"source": [
"crit = uni_descr[171]\n",
"filt = wo_duplicates['VorgangsBeschreibung'] == crit\n",
"temp = wo_duplicates[filt]\n",
"print(f\"Anzahl Einträge mit gewählter Beschreibung: {len(temp)}\")"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>288</th>\n",
" <td>155717</td>\n",
" <td>187</td>\n",
" <td>246, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>289</th>\n",
" <td>152507</td>\n",
" <td>177</td>\n",
" <td>204 S SI , Webmaschine, DL 280 EMS Breite 220</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-09</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-09</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-09</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>318</th>\n",
" <td>255972</td>\n",
" <td>249</td>\n",
" <td>203 C S SI, Webmaschine, DL 280 EMS Breite 220</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-07-30</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-07-30</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-07-30</td>\n",
" <td>2022-04-28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>319</th>\n",
" <td>255977</td>\n",
" <td>249</td>\n",
" <td>203 C S SI, Webmaschine, DL 280 EMS Breite 220</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-08-04</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-08-04</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-08-04</td>\n",
" <td>2022-04-28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>340</th>\n",
" <td>267942</td>\n",
" <td>187</td>\n",
" <td>246, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-08-07</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-08-07</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-08-07</td>\n",
" <td>2022-08-05</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText \\\n",
"288 155717 187 246, Webmaschine Jacquard, \n",
"289 152507 177 204 S SI , Webmaschine, DL 280 EMS Breite 220 \n",
"318 255972 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n",
"319 255977 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n",
"340 267942 187 246, Webmaschine Jacquard, \n",
"\n",
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"288 6 Jacquard-Webmaschine 1 Wartung \n",
"289 3 Luft-Webmaschine 1 Wartung \n",
"318 3 Luft-Webmaschine 1 Wartung \n",
"319 3 Luft-Webmaschine 1 Wartung \n",
"340 6 Jacquard-Webmaschine 1 Wartung \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"288 2022-04-01 5 0 \n",
"289 2022-04-09 5 0 \n",
"318 2022-07-30 5 0 \n",
"319 2022-08-04 5 0 \n",
"340 2022-08-07 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"289 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"318 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"319 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"340 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"289 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-09 \n",
"318 Tägliche Interne Wartungstätigkeiten Weberei 2022-07-30 \n",
"319 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-04 \n",
"340 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-07 \n",
"\n",
" ErledigungsArtText \\\n",
"288 Intern UTT - Sichtkontrolle \n",
"289 Intern UTT - Sichtkontrolle \n",
"318 Intern UTT - Sichtkontrolle \n",
"319 Intern UTT - Sichtkontrolle \n",
"340 Intern UTT - Sichtkontrolle \n",
"\n",
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"289 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"318 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"319 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"340 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"\n",
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"288 NaN 2022-04-01 2022-02-17 \n",
"289 NaN 2022-04-09 2022-02-17 \n",
"318 NaN 2022-07-30 2022-04-28 \n",
"319 NaN 2022-08-04 2022-04-28 \n",
"340 NaN 2022-08-07 2022-08-05 "
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp.head()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"# schaue welche Merkmale abweichend sind\n",
"analyse_columns = ['ObjektID', 'VorgangsTypID', 'VorgangsTypName']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ObjektID"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 187, 177, 249, 2654, 1792, 272, 271, 270, 269, 268, 186,\n",
" 178, 179, 2317, 2318, 2473, 2559, 1244, 240, 241, 180, 220,\n",
" 221, 222, 223, 224, 961, 962, 2166, 3212, 267, 266, 181,\n",
" 182, 213, 214, 174, 175, 176, 156, 157, 158, 247, 248,\n",
" 183, 265, 278, 1793, 1794, 218, 217, 219, 215, 216, 2319,\n",
" 2320, 228, 184, 152, 153, 2165, 154, 155, 159, 167, 168,\n",
" 169, 2313, 2314, 2315, 2316, 212, 211, 160, 161, 162, 164,\n",
" 165, 166, 264, 273, 274, 277, 276, 275, 279, 280, 281,\n",
" 282, 283, 242, 243, 244, 245, 246, 225, 227, 229, 170,\n",
" 171, 172, 173, 230, 231, 3213, 3211, 3214], dtype=int64)"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp['ObjektID'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"filt = temp['ObjektID'] == 2318\n",
"temp_fil1 = temp[filt]"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>878</th>\n",
" <td>269743</td>\n",
" <td>2318</td>\n",
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-10-31</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-10-31</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-10-31</td>\n",
" <td>2022-08-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6099</th>\n",
" <td>152490</td>\n",
" <td>2318</td>\n",
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-03-24</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-03-24</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-03-24</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13905</th>\n",
" <td>152476</td>\n",
" <td>2318</td>\n",
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-03-10</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-03-10</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-03-10</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14019</th>\n",
" <td>248301</td>\n",
" <td>2318</td>\n",
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-28</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-28</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-28</td>\n",
" <td>2022-04-14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14211</th>\n",
" <td>254914</td>\n",
" <td>2318</td>\n",
" <td>A067, Webmaschine, DL 280 EMS Breite 280</td>\n",
" <td>3</td>\n",
" <td>Luft-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-05-19</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-05-19</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-05-19</td>\n",
" <td>2022-04-28</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText \\\n",
"878 269743 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
"6099 152490 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
"13905 152476 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
"14019 248301 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
"14211 254914 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n",
"\n",
" ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n",
"878 3 Luft-Webmaschine 1 Wartung \n",
"6099 3 Luft-Webmaschine 1 Wartung \n",
"13905 3 Luft-Webmaschine 1 Wartung \n",
"14019 3 Luft-Webmaschine 1 Wartung \n",
"14211 3 Luft-Webmaschine 1 Wartung \n",
"\n",
" VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n",
"878 2022-10-31 5 0 \n",
"6099 2022-03-24 5 0 \n",
"13905 2022-03-10 5 0 \n",
"14019 2022-04-28 5 0 \n",
"14211 2022-05-19 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"878 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"6099 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"13905 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"14019 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"14211 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"878 Tägliche Interne Wartungstätigkeiten Weberei 2022-10-31 \n",
"6099 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-24 \n",
"13905 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-10 \n",
"14019 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-28 \n",
"14211 Tägliche Interne Wartungstätigkeiten Weberei 2022-05-19 \n",
"\n",
" ErledigungsArtText \\\n",
"878 Intern UTT - Sichtkontrolle \n",
"6099 Intern UTT - Sichtkontrolle \n",
"13905 Intern UTT - Sichtkontrolle \n",
"14019 Intern UTT - Sichtkontrolle \n",
"14211 Intern UTT - Sichtkontrolle \n",
"\n",
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
"878 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"6099 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"13905 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"14019 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"14211 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"\n",
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"878 NaN 2022-10-31 2022-08-05 \n",
"6099 NaN 2022-03-24 2022-02-17 \n",
"13905 NaN 2022-03-10 2022-02-17 \n",
"14019 NaN 2022-04-28 2022-04-14 \n",
"14211 NaN 2022-05-19 2022-04-28 "
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp_fil1.head()"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<DatetimeArray>\n",
"['2022-10-31 00:00:00', '2022-03-24 00:00:00', '2022-03-10 00:00:00',\n",
" '2022-04-28 00:00:00', '2022-05-19 00:00:00', '2022-04-09 00:00:00',\n",
" '2022-04-21 00:00:00', '2022-06-11 00:00:00', '2022-05-12 00:00:00',\n",
" '2022-04-23 00:00:00',\n",
" ...\n",
" '2022-10-28 00:00:00', '2022-07-06 00:00:00', '2023-06-14 00:00:00',\n",
" '2022-10-29 00:00:00', '2022-07-07 00:00:00', '2023-06-15 00:00:00',\n",
" '2022-05-05 00:00:00', '2022-10-30 00:00:00', '2022-07-08 00:00:00',\n",
" '2022-10-19 00:00:00']\n",
"Length: 462, dtype: datetime64[ns]"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp_fil1['VorgangsDatum'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"462"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(temp_fil1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"VorgangsID"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl einzigartiger VorgangsID 1855 mit Anteil am Gesamtdatensatz 3.89 %\n"
]
}
],
"source": [
"uni_VorgangsID = temp['VorgangsID'].unique()\n",
"num_uni_VorgangsID = len(uni_VorgangsID)\n",
"print(f'Anzahl einzigartiger VorgangsID {num_uni_VorgangsID} mit Anteil am Gesamtdatensatz {num_uni_VorgangsID / len(temp) * 100:.2f} %')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"155717"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"uni_VorgangsID[0]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"filt = temp['VorgangsID'] == uni_VorgangsID[0]\n",
"temp_fil1 = temp[filt]"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>288</th>\n",
" <td>155717</td>\n",
" <td>187</td>\n",
" <td>246, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2718</th>\n",
" <td>155717</td>\n",
" <td>1792</td>\n",
" <td>A057, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2719</th>\n",
" <td>155717</td>\n",
" <td>186</td>\n",
" <td>245 J, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2720</th>\n",
" <td>155717</td>\n",
" <td>2473</td>\n",
" <td>A056, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5504</th>\n",
" <td>155717</td>\n",
" <td>2559</td>\n",
" <td>A070, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5505</th>\n",
" <td>155717</td>\n",
" <td>961</td>\n",
" <td>A054, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5506</th>\n",
" <td>155717</td>\n",
" <td>962</td>\n",
" <td>A055, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5507</th>\n",
" <td>155717</td>\n",
" <td>2166</td>\n",
" <td>A061, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5508</th>\n",
" <td>155717</td>\n",
" <td>1793</td>\n",
" <td>A058, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5509</th>\n",
" <td>155717</td>\n",
" <td>1794</td>\n",
" <td>A059, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8294</th>\n",
" <td>155717</td>\n",
" <td>2165</td>\n",
" <td>A060, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>NaN</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
"288 155717 187 246, Webmaschine Jacquard, 6 \n",
"2718 155717 1792 A057, Webmaschine Jacquard, 6 \n",
"2719 155717 186 245 J, Webmaschine Jacquard, 6 \n",
"2720 155717 2473 A056, Webmaschine Jacquard, 6 \n",
"5504 155717 2559 A070, Webmaschine Jacquard, 6 \n",
"5505 155717 961 A054, Webmaschine Jacquard, 6 \n",
"5506 155717 962 A055, Webmaschine Jacquard, 6 \n",
"5507 155717 2166 A061, Webmaschine Jacquard, 6 \n",
"5508 155717 1793 A058, Webmaschine Jacquard, 6 \n",
"5509 155717 1794 A059, Webmaschine Jacquard, 6 \n",
"8294 155717 2165 A060, Webmaschine Jacquard, 6 \n",
"\n",
" ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
"288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"\n",
" VorgangsStatusId VorgangsPrioritaet \\\n",
"288 5 0 \n",
"2718 5 0 \n",
"2719 5 0 \n",
"2720 5 0 \n",
"5504 5 0 \n",
"5505 5 0 \n",
"5506 5 0 \n",
"5507 5 0 \n",
"5508 5 0 \n",
"5509 5 0 \n",
"8294 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"\n",
" ErledigungsArtText \\\n",
"288 Intern UTT - Sichtkontrolle \n",
"2718 Intern UTT - Sichtkontrolle \n",
"2719 Intern UTT - Sichtkontrolle \n",
"2720 Intern UTT - Sichtkontrolle \n",
"5504 Intern UTT - Sichtkontrolle \n",
"5505 Intern UTT - Sichtkontrolle \n",
"5506 Intern UTT - Sichtkontrolle \n",
"5507 Intern UTT - Sichtkontrolle \n",
"5508 Intern UTT - Sichtkontrolle \n",
"5509 Intern UTT - Sichtkontrolle \n",
"8294 Intern UTT - Sichtkontrolle \n",
"\n",
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n",
"\n",
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"288 NaN 2022-04-01 2022-02-17 \n",
"2718 NaN 2022-04-01 2022-02-17 \n",
"2719 NaN 2022-04-01 2022-02-17 \n",
"2720 NaN 2022-04-01 2022-02-17 \n",
"5504 NaN 2022-04-01 2022-02-17 \n",
"5505 NaN 2022-04-01 2022-02-17 \n",
"5506 NaN 2022-04-01 2022-02-17 \n",
"5507 NaN 2022-04-01 2022-02-17 \n",
"5508 NaN 2022-04-01 2022-02-17 \n",
"5509 NaN 2022-04-01 2022-02-17 \n",
"8294 NaN 2022-04-01 2022-02-17 "
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp_fil1"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Anzahl Einträge mit gewählter VorgangsID: 11\n",
"Anzahl einzigartiger ObjektIDs darunter: 11\n"
]
}
],
"source": [
"temp_fil2 = temp_fil1.fillna(value=False)\n",
"print(f'Anzahl Einträge mit gewählter VorgangsID: {len(temp_fil2)}')\n",
"uni_obj_id = len(temp_fil2['ObjektID'].unique())\n",
"print(f'Anzahl einzigartiger ObjektIDs darunter: {uni_obj_id}')"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 187, 1792, 186, 2473, 2559, 961, 962, 2166, 1793, 1794, 2165],\n",
" dtype=int64)"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp_fil2['ObjektID'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsID</th>\n",
" <th>ObjektID</th>\n",
" <th>HObjektText</th>\n",
" <th>ObjektArtID</th>\n",
" <th>ObjektArtText</th>\n",
" <th>VorgangsTypID</th>\n",
" <th>VorgangsTypName</th>\n",
" <th>VorgangsDatum</th>\n",
" <th>VorgangsStatusId</th>\n",
" <th>VorgangsPrioritaet</th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>VorgangsOrt</th>\n",
" <th>VorgangsArtText</th>\n",
" <th>ErledigungsDatum</th>\n",
" <th>ErledigungsArtText</th>\n",
" <th>ErledigungsBeschreibung</th>\n",
" <th>MPMelderArbeitsplatz</th>\n",
" <th>MPAbteilungBezeichnung</th>\n",
" <th>Arbeitsbeginn</th>\n",
" <th>ErstellungsDatum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>288</th>\n",
" <td>155717</td>\n",
" <td>187</td>\n",
" <td>246, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2718</th>\n",
" <td>155717</td>\n",
" <td>1792</td>\n",
" <td>A057, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2719</th>\n",
" <td>155717</td>\n",
" <td>186</td>\n",
" <td>245 J, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2720</th>\n",
" <td>155717</td>\n",
" <td>2473</td>\n",
" <td>A056, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5504</th>\n",
" <td>155717</td>\n",
" <td>2559</td>\n",
" <td>A070, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5505</th>\n",
" <td>155717</td>\n",
" <td>961</td>\n",
" <td>A054, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5506</th>\n",
" <td>155717</td>\n",
" <td>962</td>\n",
" <td>A055, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5507</th>\n",
" <td>155717</td>\n",
" <td>2166</td>\n",
" <td>A061, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5508</th>\n",
" <td>155717</td>\n",
" <td>1793</td>\n",
" <td>A058, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5509</th>\n",
" <td>155717</td>\n",
" <td>1794</td>\n",
" <td>A059, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8294</th>\n",
" <td>155717</td>\n",
" <td>2165</td>\n",
" <td>A060, Webmaschine Jacquard,</td>\n",
" <td>6</td>\n",
" <td>Jacquard-Webmaschine</td>\n",
" <td>1</td>\n",
" <td>Wartung</td>\n",
" <td>2022-04-01</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>Tägliche Wartungstätigkeiten nach Vorgabe des ...</td>\n",
" <td>False</td>\n",
" <td>Tägliche Interne Wartungstätigkeiten Weberei</td>\n",
" <td>2022-04-01</td>\n",
" <td>Intern UTT - Sichtkontrolle</td>\n",
" <td>Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten...</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>2022-04-01</td>\n",
" <td>2022-02-17</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsID ObjektID HObjektText ObjektArtID \\\n",
"288 155717 187 246, Webmaschine Jacquard, 6 \n",
"2718 155717 1792 A057, Webmaschine Jacquard, 6 \n",
"2719 155717 186 245 J, Webmaschine Jacquard, 6 \n",
"2720 155717 2473 A056, Webmaschine Jacquard, 6 \n",
"5504 155717 2559 A070, Webmaschine Jacquard, 6 \n",
"5505 155717 961 A054, Webmaschine Jacquard, 6 \n",
"5506 155717 962 A055, Webmaschine Jacquard, 6 \n",
"5507 155717 2166 A061, Webmaschine Jacquard, 6 \n",
"5508 155717 1793 A058, Webmaschine Jacquard, 6 \n",
"5509 155717 1794 A059, Webmaschine Jacquard, 6 \n",
"8294 155717 2165 A060, Webmaschine Jacquard, 6 \n",
"\n",
" ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n",
"288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n",
"\n",
" VorgangsStatusId VorgangsPrioritaet \\\n",
"288 5 0 \n",
"2718 5 0 \n",
"2719 5 0 \n",
"2720 5 0 \n",
"5504 5 0 \n",
"5505 5 0 \n",
"5506 5 0 \n",
"5507 5 0 \n",
"5508 5 0 \n",
"5509 5 0 \n",
"8294 5 0 \n",
"\n",
" VorgangsBeschreibung VorgangsOrt \\\n",
"288 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n",
"\n",
" VorgangsArtText ErledigungsDatum \\\n",
"288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n",
"\n",
" ErledigungsArtText \\\n",
"288 Intern UTT - Sichtkontrolle \n",
"2718 Intern UTT - Sichtkontrolle \n",
"2719 Intern UTT - Sichtkontrolle \n",
"2720 Intern UTT - Sichtkontrolle \n",
"5504 Intern UTT - Sichtkontrolle \n",
"5505 Intern UTT - Sichtkontrolle \n",
"5506 Intern UTT - Sichtkontrolle \n",
"5507 Intern UTT - Sichtkontrolle \n",
"5508 Intern UTT - Sichtkontrolle \n",
"5509 Intern UTT - Sichtkontrolle \n",
"8294 Intern UTT - Sichtkontrolle \n",
"\n",
" ErledigungsBeschreibung MPMelderArbeitsplatz \\\n",
"288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n",
"\n",
" MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n",
"288 False 2022-04-01 2022-02-17 \n",
"2718 False 2022-04-01 2022-02-17 \n",
"2719 False 2022-04-01 2022-02-17 \n",
"2720 False 2022-04-01 2022-02-17 \n",
"5504 False 2022-04-01 2022-02-17 \n",
"5505 False 2022-04-01 2022-02-17 \n",
"5506 False 2022-04-01 2022-02-17 \n",
"5507 False 2022-04-01 2022-02-17 \n",
"5508 False 2022-04-01 2022-02-17 \n",
"5509 False 2022-04-01 2022-02-17 \n",
"8294 False 2022-04-01 2022-02-17 "
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp_fil2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Frage: Können einem Vorgang mehrere ObjektIDs zugeordnet werden? Wenn ja, warum dann unterschiedliche Erledigungsdaten?*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Länge der Beschreibungen**"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"descriptions = descriptions.to_frame()\n",
"descriptions['length_description'] = descriptions.applymap(func=lambda x: len(x))\n",
"descriptions = descriptions.sort_values(by=['length_description'], ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 124008.000000\n",
"mean 70.351751\n",
"std 53.080901\n",
"min 1.000000\n",
"25% 66.000000\n",
"50% 66.000000\n",
"75% 67.000000\n",
"max 3137.000000\n",
"Name: length_description, dtype: float64"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# stats\n",
"len_descr = descriptions['length_description']\n",
"len_descr.describe()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>length_description</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8704</th>\n",
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
" <td>3137</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7826</th>\n",
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
" <td>3137</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49779</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124118</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14853</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" VorgangsBeschreibung length_description\n",
"8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
"7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
"49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
"124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
"14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descriptions.head()"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VorgangsBeschreibung</th>\n",
" <th>length_description</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8704</th>\n",
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
" <td>3137</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7826</th>\n",
" <td>Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...</td>\n",
" <td>3137</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49779</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>124118</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14853</th>\n",
" <td>Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...</td>\n",
" <td>2311</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13450</th>\n",
" <td></td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13451</th>\n",
" <td></td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29979</th>\n",
" <td></td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13452</th>\n",
" <td></td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21214</th>\n",
" <td>\\n</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>124008 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" VorgangsBeschreibung length_description\n",
"8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
"7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n",
"49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
"124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
"14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n",
"... ... ...\n",
"13450 1\n",
"13451 1\n",
"29979 1\n",
"13452 1\n",
"21214 \\n 1\n",
"\n",
"[124008 rows x 2 columns]"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"descriptions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}