{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# **Analyse 2**\n", "\n", "## Strategie & Fokus\n", "\n", "- Konzentration auf Export 4/5 (größste Datensätze)\n", " - jeder Datensatz gehört zu unterschiedlichem Kunden\n", " - dadurch: Abweichungen zwischen IDs und assoziierten Beschreibungen; OBjektID mehrfach vergeben\n", "\n", "### Merkmal 1 - Vorgansgbeschreibungen:\n", "\n", "- Analyse hinsichtlich möglicher Cluster in ``VorgangsBeschreibung``:\n", " - evtl. Ableitung standardisierter, auswählbarer Beschreibungen\n", " - typische Begriffe und wiederholendes Auftreten\n", "- Zusatzinformation über ``VorgangsArtText``:\n", " - teilweise standardisiert\n", " - *Verbindung zu ``VorgangsBeschreibung`` semantisch korrekt?*\n", "- Zusatzinformation ``VorgangsTypName`` mit ``VorgangsTypID``:\n", " - definitiv standardisiert\n", " - *Anzahl einzigartiger Typen?*\n", "\n", "### Merkmal 2 - Zeitbezüge innerhalb der Vorgänge\n", "\n", "- *Identifikation von Objekten, die häufig vertreten sind*\n", "- *Untersuchung der Zeitabstände zwischen Erstellung, Planung, Erledigung:*\n", " - Erstellung: ``ErstellungsDatum``\n", " - Planung: ``VorgangsDatum``\n", " - Erledigung: ``ErledigungsDatum``\n", "- *Abstände zwischen zwei ähnlichen Fehlerbildern jedes Objekts oder den Objekte, die am häufigsten vertreten sind*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Merkmal 1: Clustering von Vorgangsbeschreibungen\n", "\n", "## Recherche\n", "[Textmining HS Hannover](https://textmining.wp.hs-hannover.de/Preprocessing.html)\n", "\n", "### Allgemeine Zergliederung der Einzelbeschreibungen\n", "\n", "- Text in Sätze\n", "- Sätze in Wörter\n", "- Wörter in Grundform:\n", " - Lemma: Die Form des Wortes, wie sie in einem Wörterbuch steht. Z.B.: Haus, laufen, begründen\n", " - Stamm: Das Wort ohne Flexionsendungen (Prefixe und Suffixe). Z.B.: Haus, lauf, begründ\n", " - Wurzel: Kern des Wortes, von dem das Wort ggf. durch Derivation abgeleitet wurde. Z.B.: Haus, lauf, Grund\n", "- Wortartbestimmung\n", " - klassische Part-of-Speech-Erkennung (herkömmliche Wortart)\n", " - Named Entity Recognition (NER) (Eigennamen)\n", " - Bsp. spaCy: Person, Ort, Organisation, Verschiedenes\n", "\n", "#### Semantik\n", "\n", "- Wörter innerhalb eines Satzes größere Zusammenhänge als außerhalb\n", "\n", "### Pakete\n", "\n", "- Englisch: \n", " - [NLTK](https://www.nltk.org/)\n", "- Deutsch:\n", " - [HanTa - The Hanover Tagger](https://github.com/wartaal/HanTa/tree/master)\n", " - [TreeTagger](https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)\n", " - [Python Wrapper](https://treetaggerwrapper.readthedocs.io/en/latest/)\n", " - [spaCy](https://spacy.io/)\n", " - [Beispiel 1](https://www.trinnovative.de/blog/2020-09-08-natural-language-processing-mit-spacy.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyse" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import spacy\n", "from collections import Counter\n", "from itertools import combinations\n", "from dateutil.parser import parse\n", "import re\n", "from spellchecker import SpellChecker\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "import logging\n", "import sys\n", "import pickle\n", "\n", "LOGGING_LEVEL = 'INFO'\n", "logging.basicConfig(level=LOGGING_LEVEL, stream=sys.stdout)\n", "logger = logging.getLogger('base')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def save_pickle(obj, path):\n", " with open(path, 'wb') as file:\n", " pickle.dump(obj, file, protocol=pickle.HIGHEST_PROTOCOL)\n", " \n", "def load_pickle(path):\n", " with open(path, 'rb') as file:\n", " obj = pickle.load(file)\n", " return obj" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "sns.set()\n", "LOAD_CALC_FILES = False\n", "\n", "DESC_BLACKLIST = set(['-'])\n", "\"\"\"\n", "GENERAL_BLACKLIST = set([\n", " 'herr', 'hr.', 'förster', 'graf', 'stöppel', \n", " 'stab', 'kw', 'h.', 'koch', 'heininger', '.',\n", " 'schwab', 'm.', 'wenninger', '-', '--',\n", "])\n", "\"\"\"\n", "\n", "GENERAL_BLACKLIST = set([\n", " 'herr', 'hr.' 'kw', 'h.', '.',\n", " 'm.', '-', '--', 'dr.', 'dr',\n", "])\n", "\n", "#GENERAL_BLACKLIST = set()\n", "#POS_of_interest = set(['NOUN', 'PROPN', 'ADJ', 'VERB', 'AUX'])\n", "POS_of_interest = set(['NOUN', 'ADJ', 'VERB', 'AUX'])\n", "TAG_of_interest = set(['ADJD'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# load language model\n", "nlp = spacy.load('de_dep_news_trf')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 129020 entries, 0 to 129019\n", "Data columns (total 20 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 VorgangsID 129020 non-null int64 \n", " 1 ObjektID 129020 non-null int64 \n", " 2 HObjektText 129003 non-null object \n", " 3 ObjektArtID 129020 non-null int64 \n", " 4 ObjektArtText 128372 non-null object \n", " 5 VorgangsTypID 129020 non-null int64 \n", " 6 VorgangsTypName 129020 non-null object \n", " 7 VorgangsDatum 129020 non-null datetime64[ns]\n", " 8 VorgangsStatusId 129020 non-null int64 \n", " 9 VorgangsPrioritaet 129020 non-null int64 \n", " 10 VorgangsBeschreibung 124087 non-null object \n", " 11 VorgangsOrt 507 non-null object \n", " 12 VorgangsArtText 129020 non-null object \n", " 13 ErledigungsDatum 129020 non-null datetime64[ns]\n", " 14 ErledigungsArtText 128474 non-null object \n", " 15 ErledigungsBeschreibung 118135 non-null object \n", " 16 MPMelderArbeitsplatz 6359 non-null object \n", " 17 MPAbteilungBezeichnung 6359 non-null object \n", " 18 Arbeitsbeginn 123538 non-null datetime64[ns]\n", " 19 ErstellungsDatum 129020 non-null datetime64[ns]\n", "dtypes: datetime64[ns](4), int64(6), object(10)\n", "memory usage: 19.7+ MB\n" ] } ], "source": [ "# load dataset\n", "FILE_PATH = '01_2_Rohdaten_neu/Export4.csv'\n", "date_cols = ['VorgangsDatum', 'ErledigungsDatum', 'Arbeitsbeginn', 'ErstellungsDatum']\n", "raw = pd.read_csv(filepath_or_buffer=FILE_PATH, sep=';', encoding='cp1252', parse_dates=date_cols, dayfirst=True)\n", "raw.info()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
011114427 C , Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-0640NaNNaNKettbaum kaputt2019-03-06NaNNaNWebereiWebereiNaT2019-03-06
117124621 C , Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-1150NaNNaNasgasdg2019-03-11NaNNaNElektrowerkstattElektrowerkstattNaT2019-03-11
253244285 C, Webmaschine, SG 220 EMS5Greifer-Webmaschine3Reparaturauftrag (Portal)2019-03-1950Kupplung schleiftNaNKupplung defekt2019-03-20Reparatur UTTNaNWebereiWebereiNaT2019-03-19
358257107, Webmaschine, OM 220 EOS3Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-2150Gegengewicht wieder anbringenNaNGegengewicht an der Webmaschine abgefallen2019-03-21Reparatur UTTSchraube ausgebohrt\\nGegengewicht wieder angeb...WebereiWeberei2019-03-212019-03-21
48113800138, Schärmaschine 9,16Schärmaschine3Reparaturauftrag (Portal)2019-03-2550da ist etwas gebrochen. (Herr Heininger)NaNzentrale Bremsenverstellung linke Gatterseite ...2019-03-25Reparatur UTTBolzen gebrochen. Bolzen neu angefertig und di...VorwerkVorwerk2019-03-252019-03-25
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText \\\n", "0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n", "1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n", "2 53 244 285 C, Webmaschine, SG 220 EMS \n", "3 58 257 107, Webmaschine, OM 220 EOS \n", "4 81 138 00138, Schärmaschine 9, \n", "\n", " ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n", "0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n", "3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "0 2019-03-06 4 0 \n", "1 2019-03-11 5 0 \n", "2 2019-03-19 5 0 \n", "3 2019-03-21 5 0 \n", "4 2019-03-25 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 Kupplung schleift NaN \n", "3 Gegengewicht wieder anbringen NaN \n", "4 da ist etwas gebrochen. (Herr Heininger) NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "0 Kettbaum kaputt 2019-03-06 \n", "1 asgasdg 2019-03-11 \n", "2 Kupplung defekt 2019-03-20 \n", "3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n", "4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n", "\n", " ErledigungsArtText ErledigungsBeschreibung \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 Reparatur UTT NaN \n", "3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n", "4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n", "\n", " MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "0 Weberei Weberei NaT 2019-03-06 \n", "1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n", "2 Weberei Weberei NaT 2019-03-19 \n", "3 Weberei Weberei 2019-03-21 2019-03-21 \n", "4 Vorwerk Vorwerk 2019-03-25 2019-03-25 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Features: 20\n" ] } ], "source": [ "print(f\"Anzahl Features: {len(raw.columns)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Neue Features gegenüber letzter Analyse:**\n", "- ``ObjektArtID``\n", "- ``ObjektArtText``\n", "- ``VorgangsTypName``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Duplikate" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "duplicates_filt = raw.duplicated()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Duplikate: 84\n" ] } ], "source": [ "print(f\"Anzahl Duplikate: {duplicates_filt.sum()}\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "filt_data = raw[duplicates_filt]\n", "uni_obj_id_dupl = filt_data['ObjektID'].unique()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl einzigartiger Objekt-IDs unter Duplikaten: 47\n" ] } ], "source": [ "print(f\"Anzahl einzigartiger Objekt-IDs unter Duplikaten: {len(uni_obj_id_dupl)}\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 128936 entries, 0 to 128935\n", "Data columns (total 20 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 VorgangsID 128936 non-null int64 \n", " 1 ObjektID 128936 non-null int64 \n", " 2 HObjektText 128920 non-null object \n", " 3 ObjektArtID 128936 non-null int64 \n", " 4 ObjektArtText 128289 non-null object \n", " 5 VorgangsTypID 128936 non-null int64 \n", " 6 VorgangsTypName 128936 non-null object \n", " 7 VorgangsDatum 128936 non-null datetime64[ns]\n", " 8 VorgangsStatusId 128936 non-null int64 \n", " 9 VorgangsPrioritaet 128936 non-null int64 \n", " 10 VorgangsBeschreibung 124008 non-null object \n", " 11 VorgangsOrt 507 non-null object \n", " 12 VorgangsArtText 128936 non-null object \n", " 13 ErledigungsDatum 128936 non-null datetime64[ns]\n", " 14 ErledigungsArtText 128402 non-null object \n", " 15 ErledigungsBeschreibung 118086 non-null object \n", " 16 MPMelderArbeitsplatz 6337 non-null object \n", " 17 MPAbteilungBezeichnung 6337 non-null object \n", " 18 Arbeitsbeginn 123480 non-null datetime64[ns]\n", " 19 ErstellungsDatum 128936 non-null datetime64[ns]\n", "dtypes: datetime64[ns](4), int64(6), object(10)\n", "memory usage: 19.7+ MB\n" ] } ], "source": [ "wo_duplicates = raw.drop_duplicates(ignore_index=True)\n", "wo_duplicates.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ``VorgangsBeschreibung``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **NA vals und Duplikate**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "String-Bereinigung" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "SPECIAL_CHARS = set(['&', '$', '%', '§', '/', '(', ')', '_', \n", " '+', '–', '--', '<', '>', '´',\n", "])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def clean_string(string: str) -> str:\n", " #num_reps = 5\n", " \n", " # remove special chars\n", " pattern = r'[\\t\\n\\r\\f\\v]'\n", " string = re.sub(pattern, ' ', string)\n", " # remove dates\n", " pattern = r'[\\d]{1,4}[.:][\\d]{1,4}[.:][\\d]{1,4}'\n", " string = re.sub(pattern, '', string)\n", " # remove times\n", " pattern = r'[\\d]{1,2}[:][\\d]{1,2}[:][\\d]{0,2}'\n", " string = re.sub(pattern, '', string)\n", " # remove all chars despite punctuation and alphanumeric ones\n", " pattern = r'[^ \\w.,;:\\-äöüÄÖÜ]+'\n", " string = re.sub(pattern, '', string)\n", " # remove - where it is used as em dash\n", " pattern = r'[\\W]+-[\\W]+'\n", " string = re.sub(pattern, ' ', string)\n", " # remove whitespaces in front of punctuation\n", " pattern = r'[ ]+([;,.:])'\n", " string = re.sub(pattern, r'\\1', string)\n", " # remove multiple whitespaces\n", " pattern = r'[ ]+'\n", " string = re.sub(pattern, ' ', string)\n", " # remove whitespaces at the beginning and the end\n", " string = string.strip()\n", " \n", " #while num_reps != 0:\n", " #string = string.replace('\\n', ' ')\n", " #string = string.replace('\\t', ' ')\n", " #string = string.replace(' ', ' ')\n", " #string = string.replace(' ', ' ')\n", " #string = string.replace(' - ', ' ')\n", " \"\"\"\n", " for char in SPECIAL_CHARS:\n", " string = string.replace(char, '')\n", " \n", " #num_reps -= 1\n", " \n", " # remove spaces at the beginning and the end\n", " string = string.strip()\n", " \"\"\"\n", " \n", " return string" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "base = wo_duplicates.copy()\n", "base = base.dropna(axis=0, subset='VorgangsBeschreibung')\n", "base['VorgangsBeschreibung'] = base['VorgangsBeschreibung'].map(clean_string)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
253244285 C, Webmaschine, SG 220 EMS5Greifer-Webmaschine3Reparaturauftrag (Portal)2019-03-1950Kupplung schleiftNaNKupplung defekt2019-03-20Reparatur UTTNaNWebereiWebereiNaT2019-03-19
358257107, Webmaschine, OM 220 EOS3Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-2150Gegengewicht wieder anbringenNaNGegengewicht an der Webmaschine abgefallen2019-03-21Reparatur UTTSchraube ausgebohrt\\nGegengewicht wieder angeb...WebereiWeberei2019-03-212019-03-21
48113800138, Schärmaschine 9,16Schärmaschine3Reparaturauftrag (Portal)2019-03-2550da ist etwas gebrochen. Herr HeiningerNaNzentrale Bremsenverstellung linke Gatterseite ...2019-03-25Reparatur UTTBolzen gebrochen. Bolzen neu angefertig und di...VorwerkVorwerk2019-03-252019-03-25
5820Warenschau allgemein0NaN3Reparaturauftrag (Portal)2019-03-2550Klappbügel Portalkran H31 defektWarenschau allgemeinAllgemeine Reparaturarbeiten2019-03-25Reparatur UTTFeder ausgetauschtWarenschauWarenschau2019-03-252019-03-25
6760Neben der Türe0NaN3Reparaturauftrag (Portal)2019-03-2250Schraube nix mer gutNeben der TüreKettbaum2019-03-25Reparatur UTTSchrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\tVorwerkVorwerk2019-03-252019-03-22
...............................................................
128931518956170801708, Betriebsfahrräder Schlosserei,57Interne Wartungsobjekte1Wartung2023-06-19502-wöchige Reinigung Sichtkontrolle Technische ...NaN02 Interne Reinigung / Pflege / Überprüfung2023-06-19Intern UTT - PrüfungReinigung & Sichtkontrolle (Technische Einric...NaNNaN2023-06-192023-03-14
1289322751231654WEBEREI ALLGEMEIN, Weberei allgemein,90UTT allgemein3Reparaturauftrag (Portal)2022-09-2950Adapter entfernen und Gewinde nachschneiden.NaNKettbaum-Adapter2022-09-30Intern UTT - Reparaturmit schlosserei aufräumenWebereiWeberei2022-09-302022-09-29
1289332751251795A054.S, Jacquardmaschine,24Stäubli-Jacquardmaschine3Reparaturauftrag (Portal)2022-09-3050Alle 4 Schrauben und teile der Kettbaumlagerun...NaNKettbaum2022-09-30Intern UTT - ReparaturNeues Teil eingebaut und altes repariertWebereiWeberei2022-09-302022-09-30
128934275188100001, Ausrüstungsanlage 1,1Waschmaschine3Reparaturauftrag (Portal)2022-09-3051Walzenlager WK 6 überprüfenauswechselnNaNLagereinheit (Wälzlager, Kugellager, etc.)2022-10-04Intern UTT - ReparaturLager getauschtAusrüstungAusrüstung2022-10-042022-09-30
128935275219326B38, Niederhubwagen,32Flurförderzeuge / Putzmaschine / Rasenmäher3Reparaturauftrag (Portal)2022-10-0350Befestigung Deckel für Batteriefach defekt Hal...NaNFlurförderzeug2022-10-05Intern UTT - ReparaturNeue Gasfeder eingebautWarenschauWarenschau2022-10-042022-10-03
\n", "

124008 rows × 20 columns

\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText \\\n", "2 53 244 285 C, Webmaschine, SG 220 EMS \n", "3 58 257 107, Webmaschine, OM 220 EOS \n", "4 81 138 00138, Schärmaschine 9, \n", "5 82 0 Warenschau allgemein \n", "6 76 0 Neben der Türe \n", "... ... ... ... \n", "128931 518956 1708 01708, Betriebsfahrräder Schlosserei, \n", "128932 275123 1654 WEBEREI ALLGEMEIN, Weberei allgemein, \n", "128933 275125 1795 A054.S, Jacquardmaschine, \n", "128934 275188 1 00001, Ausrüstungsanlage 1, \n", "128935 275219 326 B38, Niederhubwagen, \n", "\n", " ObjektArtID ObjektArtText \\\n", "2 5 Greifer-Webmaschine \n", "3 3 Luft-Webmaschine \n", "4 16 Schärmaschine \n", "5 0 NaN \n", "6 0 NaN \n", "... ... ... \n", "128931 57 Interne Wartungsobjekte \n", "128932 90 UTT allgemein \n", "128933 24 Stäubli-Jacquardmaschine \n", "128934 1 Waschmaschine \n", "128935 32 Flurförderzeuge / Putzmaschine / Rasenmäher \n", "\n", " VorgangsTypID VorgangsTypName VorgangsDatum \\\n", "2 3 Reparaturauftrag (Portal) 2019-03-19 \n", "3 3 Reparaturauftrag (Portal) 2019-03-21 \n", "4 3 Reparaturauftrag (Portal) 2019-03-25 \n", "5 3 Reparaturauftrag (Portal) 2019-03-25 \n", "6 3 Reparaturauftrag (Portal) 2019-03-22 \n", "... ... ... ... \n", "128931 1 Wartung 2023-06-19 \n", "128932 3 Reparaturauftrag (Portal) 2022-09-29 \n", "128933 3 Reparaturauftrag (Portal) 2022-09-30 \n", "128934 3 Reparaturauftrag (Portal) 2022-09-30 \n", "128935 3 Reparaturauftrag (Portal) 2022-10-03 \n", "\n", " VorgangsStatusId VorgangsPrioritaet \\\n", "2 5 0 \n", "3 5 0 \n", "4 5 0 \n", "5 5 0 \n", "6 5 0 \n", "... ... ... \n", "128931 5 0 \n", "128932 5 0 \n", "128933 5 0 \n", "128934 5 1 \n", "128935 5 0 \n", "\n", " VorgangsBeschreibung \\\n", "2 Kupplung schleift \n", "3 Gegengewicht wieder anbringen \n", "4 da ist etwas gebrochen. Herr Heininger \n", "5 Klappbügel Portalkran H31 defekt \n", "6 Schraube nix mer gut \n", "... ... \n", "128931 2-wöchige Reinigung Sichtkontrolle Technische ... \n", "128932 Adapter entfernen und Gewinde nachschneiden. \n", "128933 Alle 4 Schrauben und teile der Kettbaumlagerun... \n", "128934 Walzenlager WK 6 überprüfenauswechseln \n", "128935 Befestigung Deckel für Batteriefach defekt Hal... \n", "\n", " VorgangsOrt \\\n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 Warenschau allgemein \n", "6 Neben der Türe \n", "... ... \n", "128931 NaN \n", "128932 NaN \n", "128933 NaN \n", "128934 NaN \n", "128935 NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "2 Kupplung defekt 2019-03-20 \n", "3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n", "4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n", "5 Allgemeine Reparaturarbeiten 2019-03-25 \n", "6 Kettbaum 2019-03-25 \n", "... ... ... \n", "128931 02 Interne Reinigung / Pflege / Überprüfung 2023-06-19 \n", "128932 Kettbaum-Adapter 2022-09-30 \n", "128933 Kettbaum 2022-09-30 \n", "128934 Lagereinheit (Wälzlager, Kugellager, etc.) 2022-10-04 \n", "128935 Flurförderzeug 2022-10-05 \n", "\n", " ErledigungsArtText \\\n", "2 Reparatur UTT \n", "3 Reparatur UTT \n", "4 Reparatur UTT \n", "5 Reparatur UTT \n", "6 Reparatur UTT \n", "... ... \n", "128931 Intern UTT - Prüfung \n", "128932 Intern UTT - Reparatur \n", "128933 Intern UTT - Reparatur \n", "128934 Intern UTT - Reparatur \n", "128935 Intern UTT - Reparatur \n", "\n", " ErledigungsBeschreibung \\\n", "2 NaN \n", "3 Schraube ausgebohrt\\nGegengewicht wieder angeb... \n", "4 Bolzen gebrochen. Bolzen neu angefertig und di... \n", "5 Feder ausgetauscht \n", "6 Schrauben ausgebohrt\\t\\nGewinde nachgeschnitten\\t \n", "... ... \n", "128931 Reinigung & Sichtkontrolle (Technische Einric... \n", "128932 mit schlosserei aufräumen \n", "128933 Neues Teil eingebaut und altes repariert \n", "128934 Lager getauscht \n", "128935 Neue Gasfeder eingebaut \n", "\n", " MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn \\\n", "2 Weberei Weberei NaT \n", "3 Weberei Weberei 2019-03-21 \n", "4 Vorwerk Vorwerk 2019-03-25 \n", "5 Warenschau Warenschau 2019-03-25 \n", "6 Vorwerk Vorwerk 2019-03-25 \n", "... ... ... ... \n", "128931 NaN NaN 2023-06-19 \n", "128932 Weberei Weberei 2022-09-30 \n", "128933 Weberei Weberei 2022-09-30 \n", "128934 Ausrüstung Ausrüstung 2022-10-04 \n", "128935 Warenschau Warenschau 2022-10-04 \n", "\n", " ErstellungsDatum \n", "2 2019-03-19 \n", "3 2019-03-21 \n", "4 2019-03-25 \n", "5 2019-03-25 \n", "6 2019-03-22 \n", "... ... \n", "128931 2023-03-14 \n", "128932 2022-09-29 \n", "128933 2022-09-30 \n", "128934 2022-09-30 \n", "128935 2022-10-03 \n", "\n", "[124008 rows x 20 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Einträge: 124008\n" ] } ], "source": [ "descriptions = base['VorgangsBeschreibung']\n", "print(f\"Einträge: {len(descriptions)}\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Duplikate Vorgangsbeschreibungen: 117255\n", "Anzahl einzigartiger Vorgangsbeschreibungen: 6753\n", "Anteil einzigartiger Vorgangsbeschreibungen: 5.45 %\n" ] } ], "source": [ "num_dupl_descr = descriptions.duplicated().sum()\n", "uni_descr = descriptions.unique()\n", "num_uni_descr = len(uni_descr)\n", "\n", "print(f\"Anzahl Duplikate Vorgangsbeschreibungen: {num_dupl_descr}\")\n", "print(f\"Anzahl einzigartiger Vorgangsbeschreibungen: {num_uni_descr}\")\n", "print(f\"Anteil einzigartiger Vorgangsbeschreibungen: {num_uni_descr / len(descriptions) * 100:.2f} %\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "if not LOAD_CALC_FILES:\n", " cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n", " descr_df = pd.DataFrame(columns=cols)\n", " max_val = 0\n", " text = None\n", " index = 0\n", "\n", "\n", " for idx, description in enumerate(uni_descr):\n", " len_descr = len(description)\n", " filt = base['VorgangsBeschreibung'] == description\n", " temp = base[filt]\n", " assoc_obj_ids = temp['ObjektID'].unique()\n", " assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n", " num_assoc_obj_ids = len(assoc_obj_ids)\n", " num_dupl = filt.sum()\n", " \n", " conc_df = pd.DataFrame(data=[[\n", " description,\n", " len_descr,\n", " num_dupl,\n", " assoc_obj_ids,\n", " num_assoc_obj_ids\n", " ]], columns=cols)\n", " \n", " descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n", " \n", " if num_dupl > max_val:\n", " max_val = num_dupl\n", " index = idx\n", " text = description\n", " \n", " temp1 = descr_df.sort_values(by='num_occur', ascending=False)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
161Tägliche Wartungstätigkeiten nach Vorgabe des ...6692592[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...206
33Wöchentliche Sichtkontrolle Reinigung371654[301, 304, 305, 313, 314, 331, 332, 510, 511, ...18
130Tägliche Überprüfung der Ölabscheider371616[0, 970, 2134, 2137]4
159Wöchentliche Kontrolle der WC-Anlagen371265[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...11
139Halbjährliche Kontrolle des Stabbreithalters44687[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...166
..................
2665Überprüfung der Y-Achse Schneidbrücke am LC 2 ...1761[20]1
2664Luftschlauch muss ausgetauscht werden. Ist und...1951[1]1
2663Riemenscheibe tauschen auf 650 UPM341[74]1
2660Durchführung: Sollwert: 20 0,1g311[1746]1
6752Befestigung Deckel für Batteriefach defekt Hal...991[326]1
\n", "

6753 rows × 5 columns

\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n", "33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n", "130 Tägliche Überprüfung der Ölabscheider 37 1616 \n", "159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n", "139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n", "... ... ... ... \n", "2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 176 1 \n", "2664 Luftschlauch muss ausgetauscht werden. Ist und... 195 1 \n", "2663 Riemenscheibe tauschen auf 650 UPM 34 1 \n", "2660 Durchführung: Sollwert: 20 0,1g 31 1 \n", "6752 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n", "33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n", "130 [0, 970, 2134, 2137] 4 \n", "159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n", "139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n", "... ... ... \n", "2665 [20] 1 \n", "2664 [1] 1 \n", "2663 [74] 1 \n", "2660 [1746] 1 \n", "6752 [326] 1 \n", "\n", "[6753 rows x 5 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# save/load dataframe\n", "FILE_PATH = 'VorgangsBeschreibung_analyse_1.fth'\n", "if LOAD_CALC_FILES:\n", " temp1 = pd.read_feather(FILE_PATH)\n", " temp1 = temp1.set_index('index')\n", "else:\n", " save_df = temp1.reset_index()\n", " save_df.to_feather(FILE_PATH)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "filt = temp1['descr'].str.contains('3-monatlich')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
4763-monatliche Sichtkontrolle Reinigung37222[883, 1196, 1197, 1198, 1199, 1201, 1202, 1203...18
6713-monatliche Kontrolle2220[2021, 2045]2
3033-monatliche Überprüfung durch Firma Siemens4416[2029]1
10553-monatliche Kontrolle der Wasserfilter, bei B...18616[1175, 1176]2
9143-monatliche Überprüfung der Telefonanlage4214[2035]1
2813-monatliche Überprüfung der Torsprechanlage4414[2037]1
2803-monatliche Überprüfung der Sicherheitslichts...11114[2046]1
6583-monatliche Sichtkontrolle der Not- Sicherhei...7613[2042]1
2793-monatliche Überprüfung der Regalsicherungsan...8413[2047]1
32343-monatliche Überprüfung der Personen-Überwach...7211[2040]1
6083-monatliche Kontrolle der optischen Alarmgebe...6110[2041]1
1373-monatliche Überprüfung des Abwassers durch P...708[958]1
27043-monatliche Reinigung228[903, 905]2
9953-monatliche Kontrolle der Erste-Hilfe-Kästen ...1037[2456]1
32073-monatliche Sichtkontrolle der Mittelspannung...895[2026]1
35513-monatliche Überprüfung der Uhrenanlagen Betr...844[2034]1
29393-monatliche Sichtkontrolle der Starkstrom-Anl...754[2021]1
10013-monatliche Kontrolle des Seils eventueller A...542[838]1
40543-monatliche Überprüfung des Abwassers durch P...1032[958]1
35873-monatliche Sichtkontrolle der optischen Alar...662[2041]1
42843-monatliche Überprüfung des Abwassers durch P...1421[958]1
61573-monatliche Überprüfung des Abwassers durch P...1251[958]1
53333-monatliche Kontrolle des Seils eventueller A...9951[838]1
57433-monatliche Überprüfung durch Firma Siemens. ...1101[2029]1
\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "476 3-monatliche Sichtkontrolle Reinigung 37 222 \n", "671 3-monatliche Kontrolle 22 20 \n", "303 3-monatliche Überprüfung durch Firma Siemens 44 16 \n", "1055 3-monatliche Kontrolle der Wasserfilter, bei B... 186 16 \n", "914 3-monatliche Überprüfung der Telefonanlage 42 14 \n", "281 3-monatliche Überprüfung der Torsprechanlage 44 14 \n", "280 3-monatliche Überprüfung der Sicherheitslichts... 111 14 \n", "658 3-monatliche Sichtkontrolle der Not- Sicherhei... 76 13 \n", "279 3-monatliche Überprüfung der Regalsicherungsan... 84 13 \n", "3234 3-monatliche Überprüfung der Personen-Überwach... 72 11 \n", "608 3-monatliche Kontrolle der optischen Alarmgebe... 61 10 \n", "137 3-monatliche Überprüfung des Abwassers durch P... 70 8 \n", "2704 3-monatliche Reinigung 22 8 \n", "995 3-monatliche Kontrolle der Erste-Hilfe-Kästen ... 103 7 \n", "3207 3-monatliche Sichtkontrolle der Mittelspannung... 89 5 \n", "3551 3-monatliche Überprüfung der Uhrenanlagen Betr... 84 4 \n", "2939 3-monatliche Sichtkontrolle der Starkstrom-Anl... 75 4 \n", "1001 3-monatliche Kontrolle des Seils eventueller A... 54 2 \n", "4054 3-monatliche Überprüfung des Abwassers durch P... 103 2 \n", "3587 3-monatliche Sichtkontrolle der optischen Alar... 66 2 \n", "4284 3-monatliche Überprüfung des Abwassers durch P... 142 1 \n", "6157 3-monatliche Überprüfung des Abwassers durch P... 125 1 \n", "5333 3-monatliche Kontrolle des Seils eventueller A... 995 1 \n", "5743 3-monatliche Überprüfung durch Firma Siemens. ... 110 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "476 [883, 1196, 1197, 1198, 1199, 1201, 1202, 1203... 18 \n", "671 [2021, 2045] 2 \n", "303 [2029] 1 \n", "1055 [1175, 1176] 2 \n", "914 [2035] 1 \n", "281 [2037] 1 \n", "280 [2046] 1 \n", "658 [2042] 1 \n", "279 [2047] 1 \n", "3234 [2040] 1 \n", "608 [2041] 1 \n", "137 [958] 1 \n", "2704 [903, 905] 2 \n", "995 [2456] 1 \n", "3207 [2026] 1 \n", "3551 [2034] 1 \n", "2939 [2021] 1 \n", "1001 [838] 1 \n", "4054 [958] 1 \n", "3587 [2041] 1 \n", "4284 [958] 1 \n", "6157 [958] 1 \n", "5333 [838] 1 \n", "5743 [2029] 1 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test2 = temp1.loc[filt,:]\n", "test2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "def pre_clean_spell_check(string: str) -> str:\n", " \n", " for char in SPELL_CHECK_NON_CHARS:\n", " string = string.replace(char, ' ')\n", " \n", " # remove spaces at the beginning and the end\n", " string = string.strip()\n", " \n", " return string\n", "\n", "\n", "test = temp1['descr'].map(pre_clean_spell_check)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "objs = temp1.loc[140, 'assoc_obj_ids']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7], dtype=int64)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "objs" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
166111649700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2021-03-1751-Bereich Aufwicklung, Bogenwalze Madenschraube...NaNWalze (mechanischer Defekt)2021-03-17Intern UTT - ReparaturMadenschrauben angezogenAusrüstungAusrüstung2021-03-172021-03-17
222133856700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2021-07-27512 Keilriemen gerissen- Bereich Abluftventilato...NaNAntriebsriemen (Keilriemen / Zahnriemen / Flac...2021-07-27Intern UTT - Reparaturdie Keilriemen SPA 1282 aus Neu Ulm geholt und...AusrüstungAusrüstung2021-07-272021-07-27
240140704700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2021-09-2951Wir benötigen einen weiteren Ersatzteilschrank...NaNAllgemeine Reparaturarbeiten2021-10-04Intern UTT - MontageWurde montiertAusrüstungAusrüstung2021-10-042021-09-29
437123811700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2021-05-0651bitte dringend 10l Eimer zum Silikon versenden...NaNMaschineninfrastruktur2021-05-06Intern UTT - WartungNaNAusrüstungAusrüstung2021-05-062021-05-06
439107885700007, Ausrüstung 2,2Beschichtungsmaschinen1Wartung2021-07-0151Monatliche Kontrolle des Flusen-AbsaugrohrsNaNMaschinen-Wartung monatlich2021-06-28Intern UTT - SichtkontrolleNaNNaNNaNNaT2021-03-03
...............................................................
128396531424700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2023-05-0851KKT Chiller Auslauf Störung. Füllstand Min. STOPNaNAllgemeine Reparaturarbeiten2023-05-08Intern UTT - ReparaturKühlflüssigkeit aufgefüllt und Filter gewechse...Ausrüstung 2Ausrüstung2023-05-082023-05-08
128446530613700007, Ausrüstung 2,2Beschichtungsmaschinen1Wartung2023-05-3051Monatliche Überprüfung der Gasleitung mit dem ...NaN01 Interne Reinigung / Pflege / Überprüfung2023-06-05Intern UTT - PrüfungDichtheitsprüfung der GasleitungenNaNNaN2023-06-052023-04-24
128563580234700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2023-05-3051Mischer für Beschichtungsanlage bitte ausbrenn...NaNAllgemeine Reparaturarbeiten2023-05-30Intern UTT - ReparaturerledigtAusrüstung 2, KombianlageAusrüstung2023-05-302023-05-30
128636586208700007, Ausrüstung 2,2Beschichtungsmaschinen3Reparaturauftrag (Portal)2023-06-1251Haken im Kran Auslauf defektNaNAllgemeine Reparaturarbeiten2023-06-12Intern UTT - ReparaturHaken getauschtAusrüstungAusrüstung2023-06-122023-06-12
128915261786700007, Ausrüstung 2,2Beschichtungsmaschinen1Wartung2023-05-3051Kontrolle der Risiko-ErsatzteileNaNÜberprüfung Risikoersatzteile2023-05-30Intern UTT - Dokumentenkontrolleerledigt.\\nNaNNaN2023-05-302022-06-30
\n", "

272 rows × 20 columns

\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText ObjektArtID \\\n", "166 111649 7 00007, Ausrüstung 2, 2 \n", "222 133856 7 00007, Ausrüstung 2, 2 \n", "240 140704 7 00007, Ausrüstung 2, 2 \n", "437 123811 7 00007, Ausrüstung 2, 2 \n", "439 107885 7 00007, Ausrüstung 2, 2 \n", "... ... ... ... ... \n", "128396 531424 7 00007, Ausrüstung 2, 2 \n", "128446 530613 7 00007, Ausrüstung 2, 2 \n", "128563 580234 7 00007, Ausrüstung 2, 2 \n", "128636 586208 7 00007, Ausrüstung 2, 2 \n", "128915 261786 7 00007, Ausrüstung 2, 2 \n", "\n", " ObjektArtText VorgangsTypID VorgangsTypName \\\n", "166 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "222 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "240 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "437 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "439 Beschichtungsmaschinen 1 Wartung \n", "... ... ... ... \n", "128396 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "128446 Beschichtungsmaschinen 1 Wartung \n", "128563 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "128636 Beschichtungsmaschinen 3 Reparaturauftrag (Portal) \n", "128915 Beschichtungsmaschinen 1 Wartung \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "166 2021-03-17 5 1 \n", "222 2021-07-27 5 1 \n", "240 2021-09-29 5 1 \n", "437 2021-05-06 5 1 \n", "439 2021-07-01 5 1 \n", "... ... ... ... \n", "128396 2023-05-08 5 1 \n", "128446 2023-05-30 5 1 \n", "128563 2023-05-30 5 1 \n", "128636 2023-06-12 5 1 \n", "128915 2023-05-30 5 1 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "166 -Bereich Aufwicklung, Bogenwalze Madenschraube... NaN \n", "222 2 Keilriemen gerissen- Bereich Abluftventilato... NaN \n", "240 Wir benötigen einen weiteren Ersatzteilschrank... NaN \n", "437 bitte dringend 10l Eimer zum Silikon versenden... NaN \n", "439 Monatliche Kontrolle des Flusen-Absaugrohrs NaN \n", "... ... ... \n", "128396 KKT Chiller Auslauf Störung. Füllstand Min. STOP NaN \n", "128446 Monatliche Überprüfung der Gasleitung mit dem ... NaN \n", "128563 Mischer für Beschichtungsanlage bitte ausbrenn... NaN \n", "128636 Haken im Kran Auslauf defekt NaN \n", "128915 Kontrolle der Risiko-Ersatzteile NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "166 Walze (mechanischer Defekt) 2021-03-17 \n", "222 Antriebsriemen (Keilriemen / Zahnriemen / Flac... 2021-07-27 \n", "240 Allgemeine Reparaturarbeiten 2021-10-04 \n", "437 Maschineninfrastruktur 2021-05-06 \n", "439 Maschinen-Wartung monatlich 2021-06-28 \n", "... ... ... \n", "128396 Allgemeine Reparaturarbeiten 2023-05-08 \n", "128446 01 Interne Reinigung / Pflege / Überprüfung 2023-06-05 \n", "128563 Allgemeine Reparaturarbeiten 2023-05-30 \n", "128636 Allgemeine Reparaturarbeiten 2023-06-12 \n", "128915 Überprüfung Risikoersatzteile 2023-05-30 \n", "\n", " ErledigungsArtText \\\n", "166 Intern UTT - Reparatur \n", "222 Intern UTT - Reparatur \n", "240 Intern UTT - Montage \n", "437 Intern UTT - Wartung \n", "439 Intern UTT - Sichtkontrolle \n", "... ... \n", "128396 Intern UTT - Reparatur \n", "128446 Intern UTT - Prüfung \n", "128563 Intern UTT - Reparatur \n", "128636 Intern UTT - Reparatur \n", "128915 Intern UTT - Dokumentenkontrolle \n", "\n", " ErledigungsBeschreibung \\\n", "166 Madenschrauben angezogen \n", "222 die Keilriemen SPA 1282 aus Neu Ulm geholt und... \n", "240 Wurde montiert \n", "437 NaN \n", "439 NaN \n", "... ... \n", "128396 Kühlflüssigkeit aufgefüllt und Filter gewechse... \n", "128446 Dichtheitsprüfung der Gasleitungen \n", "128563 erledigt \n", "128636 Haken getauscht \n", "128915 erledigt.\\n \n", "\n", " MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn \\\n", "166 Ausrüstung Ausrüstung 2021-03-17 \n", "222 Ausrüstung Ausrüstung 2021-07-27 \n", "240 Ausrüstung Ausrüstung 2021-10-04 \n", "437 Ausrüstung Ausrüstung 2021-05-06 \n", "439 NaN NaN NaT \n", "... ... ... ... \n", "128396 Ausrüstung 2 Ausrüstung 2023-05-08 \n", "128446 NaN NaN 2023-06-05 \n", "128563 Ausrüstung 2, Kombianlage Ausrüstung 2023-05-30 \n", "128636 Ausrüstung Ausrüstung 2023-06-12 \n", "128915 NaN NaN 2023-05-30 \n", "\n", " ErstellungsDatum \n", "166 2021-03-17 \n", "222 2021-07-27 \n", "240 2021-09-29 \n", "437 2021-05-06 \n", "439 2021-03-03 \n", "... ... \n", "128396 2023-05-08 \n", "128446 2023-04-24 \n", "128563 2023-05-30 \n", "128636 2023-06-12 \n", "128915 2022-06-30 \n", "\n", "[272 rows x 20 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base.loc[base['ObjektID'] == objs[0],:]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
161Tägliche Wartungstätigkeiten nach Vorgabe des ...6692592[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...206
33Wöchentliche Sichtkontrolle Reinigung371654[301, 304, 305, 313, 314, 331, 332, 510, 511, ...18
130Tägliche Überprüfung der Ölabscheider371616[0, 970, 2134, 2137]4
159Wöchentliche Kontrolle der WC-Anlagen371265[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...11
139Halbjährliche Kontrolle des Stabbreithalters44687[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...166
..................
2665Überprüfung der Y-Achse Schneidbrücke am LC 2 ...1761[20]1
2664Luftschlauch muss ausgetauscht werden. Ist und...1951[1]1
2663Riemenscheibe tauschen auf 650 UPM341[74]1
2660Durchführung: Sollwert: 20 0,1g311[1746]1
6752Befestigung Deckel für Batteriefach defekt Hal...991[326]1
\n", "

6753 rows × 5 columns

\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n", "33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n", "130 Tägliche Überprüfung der Ölabscheider 37 1616 \n", "159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n", "139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n", "... ... ... ... \n", "2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 176 1 \n", "2664 Luftschlauch muss ausgetauscht werden. Ist und... 195 1 \n", "2663 Riemenscheibe tauschen auf 650 UPM 34 1 \n", "2660 Durchführung: Sollwert: 20 0,1g 31 1 \n", "6752 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n", "33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n", "130 [0, 970, 2134, 2137] 4 \n", "159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n", "139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n", "... ... ... \n", "2665 [20] 1 \n", "2664 [1] 1 \n", "2663 [74] 1 \n", "2660 [1746] 1 \n", "6752 [326] 1 \n", "\n", "[6753 rows x 5 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1.iat[0,0]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Wöchentliche Sichtkontrolle Reinigung'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1.iat[1,0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### spaCy" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Durchführung: Sollwert: 20 0,1g'" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "string = temp1.iloc[-2,0]\n", "#string = temp1.iloc[0,0]\n", "string" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "string = 'Ich spiele jeden Tag mit den Kindern im Garten. Das ist schön.'\n", "string = 'Die Maschine XYZ ist aufgrund einer Störung im Druckluftsystem defekt.'\n", "#string = 'Wir benötigen das Werkzeug von Herr Stöppel, um das derzeit abzuarbeiten.Dies wird durch Herrn Strebe getan.'" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "doc = nlp(string)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(11, 11)\n" ] }, { "data": { "text/plain": [ "array([[ 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3],\n", " [ 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3],\n", " [ 0, 0, 2, 3, 3, 3, 3, 3, 3, 3, 3],\n", " [ 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3],\n", " [ 0, 0, 0, 0, 4, 4, 4, 4, 4, 3, 3],\n", " [ 0, 0, 0, 0, 0, 5, 6, 6, 6, 3, 3],\n", " [ 0, 0, 0, 0, 0, 0, 6, 6, 6, 3, 3],\n", " [ 0, 0, 0, 0, 0, 0, 0, 7, 7, 3, 3],\n", " [ 0, 0, 0, 0, 0, 0, 0, 0, 8, 3, 3],\n", " [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 3],\n", " [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lca_matrix = doc.get_lca_matrix()\n", "print(lca_matrix.shape)\n", "lca_matrix = np.triu(lca_matrix)\n", "lca_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "nested children:\n", "- [x] Gewichtung über Anzahl Erscheinungen\n", "- [x] AUX-Wörter: evtl. alle aossoziierten Wörter in Beziehung setzen\n", "- [ ] Dual Link zwischen zwei Wörtern eines Baums (sinnvoll?)\n", " - nicht wirklich sinnvoll, da einfache Verbindung durch Gewicht schon berücksichtigt\n", " - schlussendlich würde jede Verbindung im Gewicht verdoppelt werden" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# simulate occurence counter\n", "OCC_COUNTER = 10" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "SPELL_CHECK_NON_CHARS = set([' ', '.', ',', ';', ':', '-'])\n", "\n", "def pre_clean_word(string: str) -> str:\n", " \n", " pattern = r'[^A-Za-zäöüÄÖÜ]+'\n", " string = re.sub(pattern, '', string)\n", " \"\"\"\n", " for char in SPELL_CHECK_NON_CHARS:\n", " string = string.replace(char, '')\n", " \"\"\"\n", " \n", " return string\n", "\n", "# https://stackoverflow.com/questions/25341945/check-if-string-has-date-any-format \n", "def is_str_date(string, fuzzy=False):\n", " \n", " try:\n", " parse(string, fuzzy=fuzzy)\n", " return True\n", " except ValueError:\n", " return False\n", "\n", "\n", "def obtain_sub_tree(token):\n", " # check if token is a POS of interest\n", " descendants = list(token.subtree)\n", " descendants.remove(token)\n", " logger.debug(f'Token >>{token}<< has subtree >>{descendants}<<')\n", " return descendants\n", "\n", "\n", "def add_children_descendants(\n", " parent,\n", " weight,\n", " connections,\n", " unique_tokens,\n", " children_sents,\n", "):\n", " # add child as key\n", " if (parent.lemma_, parent.pos_) in connections:\n", " connections[(parent.lemma_, parent.pos_)].append(children_sents)\n", " #connections[parent.lemma_].append([descendant.lemma_, descendant])\n", " else:\n", " # do not add auxiliary words\n", " if parent.pos_ != 'AUX':\n", " unique_tokens.add(parent.lemma_)\n", " connections[(parent.lemma_, parent.pos_)] = list()\n", " connections[(parent.lemma_, parent.pos_)].append(children_sents)\n", " #connections[parent.lemma_].append([descendant.lemma_, descendant])\n", " \n", " return None\n", "\n", "\n", "def obtain_descendant_info(\n", " doc,\n", " weight,\n", " POS_of_interest,\n", " TAG_of_interest,\n", " connections,\n", " unique_tokens,\n", " spell_check_candidates,\n", " spell_check_whitelist,\n", " spell_checker,\n", " corrections,\n", "):\n", " global GENERAL_BLACKLIST\n", " \n", " # iterate over sentences\n", " for sent in doc.sents:\n", " # spell check list\n", " spell_check_words = list()\n", " \n", " # iterate over tokens in one sentence\n", " for token in sent:\n", " \n", " if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n", " continue\n", " elif token.lemma_.lower() in GENERAL_BLACKLIST:\n", " logger.debug(f'Eliminated parent >>{token}<< because of blacklist')\n", " continue\n", " \n", " # spell check\n", " if token.lemma_.lower() not in spell_check_whitelist:\n", " word = pre_clean_word(string=token.lemma_.lower())\n", " if word in corrections:\n", " word = corrections[word]\n", " elif not word.isdigit():\n", " spell_check_words.append(word)\n", " \n", " descendants = obtain_sub_tree(token=token)\n", " \n", " # iterate over all children if there are any\n", " if descendants is not None:\n", " # list with all children in the current sentence\n", " children_sents = list()\n", " \n", " for child in descendants:\n", " logger.debug(f'Token is >>{token}<< with child >>{child}<< and POS {child.pos_}')\n", " \n", " # elimnate cases of cross-references with verbs\n", " if ((token.pos_ == 'AUX' or token.pos_ == 'VERB') and\n", " (child.pos_ == 'AUX' or child.pos_ == 'VERB')):\n", " continue\n", " elif not (child.pos_ in POS_of_interest or child.tag_ in TAG_of_interest):\n", " continue\n", " elif child.lemma_.lower() in GENERAL_BLACKLIST:\n", " logger.debug(f'Eliminated child >>{child}<< because of blacklist')\n", " continue\n", " \n", " if (child not in DESC_BLACKLIST and\n", " not is_str_date(string=child.text)):\n", " children_sents.append((child.lemma_, weight))\n", " \n", " if child.lemma_ not in unique_tokens:\n", " unique_tokens.add(child.lemma_)\n", " \n", " if child.lemma_.lower() not in spell_check_whitelist:\n", " word = pre_clean_word(string=child.lemma_.lower())\n", " if word in corrections:\n", " word = corrections[word]\n", " elif not word.isdigit():\n", " spell_check_words.append(word)\n", " \n", " # add list of children for current parent if not empty\n", " if children_sents:\n", " add_children_descendants(\n", " parent=token,\n", " weight=weight,\n", " connections=connections,\n", " unique_tokens=unique_tokens,\n", " children_sents=children_sents,\n", " )\n", "\n", " misspelled_candidates = spell_checker.unknown(spell_check_words)\n", " spell_check_candidates.update(misspelled_candidates)\n", " \n", " \n", " return None" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TokenLemmaPOSTagDep
0DiederDETARTnk
1MaschineMaschineNOUNNNsb
2XYZXYZPROPNNEnk
3istseinAUXVAFINROOT
4aufgrundaufgrundADPAPPRmo
5einereinDETARTnk
6StörungStörungNOUNNNnk
7iminADPAPPRARTmnr
8DruckluftsystemDruckluftsystemNOUNNNnk
9defektdefektADVADJDpd
10.--PUNCT$.punct
\n", "
" ], "text/plain": [ " Token Lemma POS Tag Dep\n", "0 Die der DET ART nk\n", "1 Maschine Maschine NOUN NN sb\n", "2 XYZ XYZ PROPN NE nk\n", "3 ist sein AUX VAFIN ROOT\n", "4 aufgrund aufgrund ADP APPR mo\n", "5 einer ein DET ART nk\n", "6 Störung Störung NOUN NN nk\n", "7 im in ADP APPRART mnr\n", "8 Druckluftsystem Druckluftsystem NOUN NN nk\n", "9 defekt defekt ADV ADJD pd\n", "10 . -- PUNCT $. punct" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame({\"Token\": [token.text for token in doc],\n", " \"Lemma\": [token.lemma_ for token in doc],\n", " \"POS\": [token.pos_ for token in doc],\n", " \"Tag\": [token.tag_ for token in doc],\n", " \"Dep\": [token.dep_ for token in doc]})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def obtain_adj_matrix(unique_tokens, connections):\n", "\n", " adj_mat = pd.DataFrame(\n", " data=0, \n", " columns=list(unique_tokens), \n", " index=list(unique_tokens),\n", " dtype=np.uint32,\n", " )\n", " \n", " for (pred, POS), descendants_list in connections.items():\n", " #print(f'{pred=}, {descendants=}')\n", " \n", " for descendants in descendants_list:\n", " #print(f'{descendants}')\n", " \n", " if POS != 'AUX':\n", " for (desc, weight) in descendants:\n", " adj_mat.at[pred, desc] += weight\n", " \n", " else:\n", " if len(descendants) > 1:\n", " # if auxiliary word, make connection between all associated words\n", " combs = combinations(descendants, r=2)\n", " \n", " for comb in combs:\n", " # comb is tuple ((word_1, weight), (word_2, weight))\n", " weight = comb[0][1]\n", " word_1 = comb[0][0]\n", " word_2 = comb[1][0]\n", " \n", " \"\"\"\n", " if ((word_1 == 'Eigenverantwortlichkeit' or word_1 == 'neu') and\n", " (word_2 == 'Eigenverantwortlichkeit' or word_2 == 'neu')):\n", " print(f'Hello from {pred=} with {descendants=}')\n", " \"\"\"\n", " \n", " adj_mat.at[word_1, word_2] += weight\n", " \n", " \n", " return adj_mat\n", "\n", "\n", "def make_undir_adj_matrix(adj_mat):\n", " \n", " adj_mat_undir = adj_mat.copy()\n", " arr = adj_mat_undir.to_numpy()\n", " arr_upper = np.triu(arr)\n", " arr_lower = np.tril(arr)\n", " arr_lower = np.rot90(np.fliplr(arr_lower))\n", " arr_new = arr_lower + arr_upper\n", " \n", " adj_mat_undir.loc[:] = arr_new\n", " \n", " return adj_mat_undir" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " Die\n", " DET\n", "\n", "\n", "\n", " Maschine\n", " NOUN\n", "\n", "\n", "\n", " XYZ\n", " PROPN\n", "\n", "\n", "\n", " ist\n", " AUX\n", "\n", "\n", "\n", " aufgrund\n", " ADP\n", "\n", "\n", "\n", " einer\n", " DET\n", "\n", "\n", "\n", " Störung\n", " NOUN\n", "\n", "\n", "\n", " im\n", " ADP\n", "\n", "\n", "\n", " Druckluftsystem\n", " NOUN\n", "\n", "\n", "\n", " defekt.\n", " ADV\n", "\n", "\n", "\n", " \n", " \n", " nk\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " sb\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " nk\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " mo\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " nk\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " nk\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " mnr\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " nk\n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " pd\n", " \n", " \n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "spacy.displacy.render(doc)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Gesamter Datensatz" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "# analysiere erste 10 Einträge\n", "descr = temp1[['descr', 'num_occur']]\n", "#descr = descr.iloc[50:200,:]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6753" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(descr)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrnum_occur
161Tägliche Wartungstätigkeiten nach Vorgabe des ...92592
33Wöchentliche Sichtkontrolle Reinigung1654
130Tägliche Überprüfung der Ölabscheider1616
159Wöchentliche Kontrolle der WC-Anlagen1265
139Halbjährliche Kontrolle des Stabbreithalters687
.........
2665Überprüfung der Y-Achse Schneidbrücke am LC 2 ...1
2664Luftschlauch muss ausgetauscht werden. Ist und...1
2663Riemenscheibe tauschen auf 650 UPM1
2660Durchführung: Sollwert: 20 0,1g1
6752Befestigung Deckel für Batteriefach defekt Hal...1
\n", "

6753 rows × 2 columns

\n", "
" ], "text/plain": [ " descr num_occur\n", "161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 92592\n", "33 Wöchentliche Sichtkontrolle Reinigung 1654\n", "130 Tägliche Überprüfung der Ölabscheider 1616\n", "159 Wöchentliche Kontrolle der WC-Anlagen 1265\n", "139 Halbjährliche Kontrolle des Stabbreithalters 687\n", "... ... ...\n", "2665 Überprüfung der Y-Achse Schneidbrücke am LC 2 ... 1\n", "2664 Luftschlauch muss ausgetauscht werden. Ist und... 1\n", "2663 Riemenscheibe tauschen auf 650 UPM 1\n", "2660 Durchführung: Sollwert: 20 0,1g 1\n", "6752 Befestigung Deckel für Batteriefach defekt Hal... 1\n", "\n", "[6753 rows x 2 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descr" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "#LOAD_CALC_FILES = True\n", "#LOAD_CALC_FILES = False\n", "#IS_TEST = True\n", "IS_TEST = False" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "spell_check_whitelist = {\n", " '',\n", " 'beschlag',\n", " 'brandschutztechnische',\n", " 'dichtung',\n", " 'festhaltevorrichtung',\n", " 'funktion',\n", " 'halbjährliche',\n", " 'kontrolle',\n", " 'maschinenhersteller',\n", " 'prüfung',\n", " 'reinigung',\n", " 'scharnier',\n", " 'schließvorrichtung',\n", " 'schmierung',\n", " 'sichtkontrolle',\n", " 'stabbreithalter',\n", " 'technikrundgang',\n", " 'vorgabe',\n", " 'wartungstätigkeit',\n", " 'wcanlage',\n", " 'ölabscheider',\n", " 'abarbeiten',\n", " 'abgleichen',\n", " 'abschmieren',\n", " 'abschmierung',\n", " 'abteilungsleiter',\n", " 'akku',\n", " 'analyse',\n", " 'arbeitsplan',\n", " 'aschenbecher',\n", " 'auffüllen',\n", " 'auflistung',\n", " 'befestigungsschraube',\n", " 'beschädigung',\n", " 'betriebsstunde',\n", " 'blombe',\n", " 'blombieren',\n", " 'brückner',\n", " 'campenabwickler',\n", " 'campenaufwickler',\n", " 'desinfektionsmittel',\n", " 'dichtigkeit',\n", " 'druckkontrolle',\n", " 'efficiosystem',\n", " 'eigenverantwortlichkeit',\n", " 'einrichtung',\n", " 'email',\n", " 'erledigungsdatum',\n", " 'extradate',\n", " 'extradatum',\n", " 'filter',\n", " 'firma',\n", " 'formplatte',\n", " 'frostprävention',\n", " 'gegendruckbolze',\n", " 'gesamtanlage',\n", " 'heizungsanlage',\n", " 'keller',\n", " 'kesselhauskontrolle',\n", " 'kesselwasser',\n", " 'koffer',\n", " 'kompensator',\n", " 'kompressorstation',\n", " 'kondensat',\n", " 'kühlturm',\n", " 'kühltürme',\n", " 'lager',\n", " 'laserabteilung',\n", " 'leckage',\n", " 'leerung',\n", " 'leiterprüfung',\n", " 'linearkugellager',\n", " 'luftdruckkontrolle',\n", " 'magazin',\n", " 'maschinenbediener',\n", " 'messwert',\n", " 'monat',\n", " 'motor',\n", " 'papiermüllbehälter',\n", " 'personalbüro',\n", " 'pflasterschrank',\n", " 'rieme',\n", " 'rollenkette',\n", " 'rundgang',\n", " 'schweißkopf',\n", " 'schweisskopf',\n", " 'sichtprüfung',\n", " 'speisewasser',\n", " 'sprinkleranlage',\n", " 'temperatursensor',\n", " 'terminieren',\n", " 'ticket',\n", " 'trommel',\n", " 'täglicher',\n", " 'uvröhre',\n", " 'ventilator',\n", " 'verbandsmaterial',\n", " 'verschleiß',\n", " 'verschleiss',\n", " 'vorbelegung',\n", " 'wartung',\n", " 'wartungsarbeit',\n", " 'wartungsplan',\n", " 'wasseraufbereitung',\n", " 'wasseraufbereitungsanlage',\n", " 'wasserverbrauch',\n", " 'weberei',\n", " 'wumagtrockner',\n", " 'wäscherkontrolle',\n", " 'wöchig',\n", " 'abdichten',\n", " 'abfluprüfung',\n", " 'ablesen',\n", " 'abluftkanal',\n", " 'absauganlage',\n", " 'abspeichern',\n", " 'absprache',\n", " 'aktivkohlepatron',\n", " 'aktivkohlepatrone',\n", " 'anbackung',\n", " 'anfragen',\n", " 'angebot',\n", " 'anpresswalze',\n", " 'ansaug',\n", " 'anschluss',\n", " 'anschluß',\n", " 'anzahl',\n", " 'auen',\n", " 'auenbereich',\n", " 'aueneinheit',\n", " 'aufwickler',\n", " 'ausblasöffnung',\n", " 'ausbrennen',\n", " 'auslassventil',\n", " 'ausrüstung',\n", " 'austausch',\n", " 'axialpendelrollenlager',\n", " 'batteriewechsel',\n", " 'batterieüberprüfung',\n", " 'baugruppe',\n", " 'baumwolltuch',\n", " 'bauteil',\n", " 'befeuchter',\n", " 'beleuchtung',\n", " 'beschichtunglegierung',\n", " 'besprechungszimmer',\n", " 'bestandskontrolle',\n", " 'bestellformular',\n", " 'bestätigung',\n", " 'bezeichnung',\n", " 'binder',\n", " 'blutstop',\n", " 'bolze',\n", " 'breitstreckwalze',\n", " 'containerstellfläche',\n", " 'contrawalze',\n", " 'dachfläche',\n", " 'dampfzylinder',\n", " 'deformierung',\n", " 'dezember',\n", " 'din',\n", " 'docke',\n", " 'dokumentation',\n", " 'dosierpumpe',\n", " 'druckluftbehälter',\n", " 'druckluftleitung',\n", " 'druckluftschläuche',\n", " 'drucktestkontrolle',\n", " 'einterminieren',\n", " 'eintragung',\n", " 'einzelprotokoll',\n", " 'einziehwalze',\n", " 'elektisch',\n", " 'element',\n", " 'enthärtung',\n", " 'entwässern',\n", " 'erledigungsbeschreibeung',\n", " 'erstehilfeeinrichtung',\n", " 'erweiterung',\n", " 'explosionsschutzanlage',\n", " 'extradaten',\n", " 'exzenterringbefestigung',\n", " 'fa',\n", " 'fach',\n", " 'faltenbalge',\n", " 'feedbackinput',\n", " 'feuerwehrumfahrung',\n", " 'filert',\n", " 'filteranlage',\n", " 'filterelement',\n", " 'filterstufe',\n", " 'fixtermin',\n", " 'flanschlager',\n", " 'flanschlagerquadrat',\n", " 'fluchtwegsymbol',\n", " 'flusenabsaugrohr',\n", " 'freilauf',\n", " 'fremdkörper',\n", " 'führungswagen',\n", " 'gaslager',\n", " 'gaszählerstand',\n", " 'gatter',\n", " 'geräteinner',\n", " 'geräteinneres',\n", " 'geräusch',\n", " 'gesamt',\n", " 'gesamterzeugt',\n", " 'getränkeautomat',\n", " 'gewindebefestigung',\n", " 'gewindestiftbefestigung',\n", " 'gleitschiene',\n", " 'grat',\n", " 'gro',\n", " 'grundplatte',\n", " 'halle',\n", " 'haupteingang',\n", " 'hebebühne',\n", " 'hebezeug',\n", " 'helm',\n", " 'hersteller',\n", " 'hochregal',\n", " 'hochtemperatur',\n", " 'hochtemperatureinsatz',\n", " 'hydraulik',\n", " 'hydrauliköl',\n", " 'impulseingang',\n", " 'indikator',\n", " 'inneneinheit',\n", " 'insektenvernichter',\n", " 'kabel',\n", " 'kammer',\n", " 'karton',\n", " 'kegelradgetriebe',\n", " 'kegelradgetriebemotor',\n", " 'kette',\n", " 'klemmrolle',\n", " 'klimaanlage',\n", " 'klimabühne',\n", " 'klimagerät',\n", " 'kompressor',\n", " 'kompressorluftwert',\n", " 'kontoll',\n", " 'kontrawalze',\n", " 'kontroll',\n", " 'krankheit',\n", " 'krän',\n", " 'kräne',\n", " 'kuehlaggregat',\n", " 'kw',\n", " 'kühlgerät',\n", " 'lagereinheit',\n", " 'lagereinsatz',\n", " 'lagerort',\n", " 'lagerung',\n", " 'laser',\n", " 'laufgeräusche',\n", " 'luftansaugseite',\n", " 'luftfilter',\n", " 'luftfilterwasserabscheider',\n", " 'luftmenge',\n", " 'luftreiniger',\n", " 'lösungsmittel',\n", " 'lüftungsanlage',\n", " 'macke',\n", " 'managementsystem',\n", " 'maschinenanschluss',\n", " 'materialzersetzung',\n", " 'messlager',\n", " 'micron',\n", " 'mischer',\n", " 'monatlicher',\n", " 'monatliches',\n", " 'monteur',\n", " 'moos',\n", " 'motorstart',\n", " 'nachfetten',\n", " 'nachschmieren',\n", " 'nachspann',\n", " 'neuvertrag',\n", " 'nord',\n", " 'nottelefon',\n", " 'nr',\n", " 'oberer',\n", " 'oberflächenkontrolle',\n", " 'objektkarte',\n", " 'palette',\n", " 'pendelkugellager',\n", " 'pfeifer',\n", " 'platine',\n", " 'pneum',\n", " 'pneumatikventil',\n", " 'pneumatisch',\n", " 'pos',\n", " 'positioniersystem',\n", " 'prozesskennzahl',\n", " 'prüfbericht',\n", " 'prüfplan',\n", " 'rampenbereich',\n", " 'rauwalze',\n", " 'regalprüfer',\n", " 'regalsicherungsanlage',\n", " 'reiniger',\n", " 'reinigungstuch',\n", " 'restlich',\n", " 'risikoersatzteil',\n", " 'rohrtrenner',\n", " 'roller',\n", " 'rundgangkontrollen',\n", " 'rückmeldung',\n", " 'sae',\n", " 'sauberkeit',\n", " 'schlitten',\n", " 'schmierstoff',\n", " 'schmierstoffmenge',\n", " 'schneider',\n", " 'schraube',\n", " 'schraubenbestand',\n", " 'schutzabdeckung',\n", " 'sicherheitsbeleuchtung',\n", " 'sicherheitseinrichtung',\n", " 'sicherheitslichtschranke',\n", " 'sicherheitsweste',\n", " 'sicherstellung',\n", " 'sonotrode',\n", " 'sonotrodenständer',\n", " 'spannkopflager',\n", " 'spannlager',\n", " 'spannrahmen',\n", " 'spindel',\n", " 'spindelhubgetriebe',\n", " 'spindelmutter',\n", " 'spülzeitprüfung',\n", " 'stab',\n", " 'stadtwasser',\n", " 'stehlager',\n", " 'stehlagergehäuse',\n", " 'steuerung',\n", " 'stückliste',\n", " 'systemumstellung',\n", " 'telefonanlage',\n", " 'telefonat',\n", " 'termin',\n", " 'terminabsprache',\n", " 'terminiern',\n", " 'terminiert',\n", " 'terminierung',\n", " 'terminvorschlag',\n", " 'testomat',\n", " 'thermoheizelement',\n", " 'torsprechanlage',\n", " 'trinkwassernetz',\n", " 'trockenzylinder',\n", " 'tänzerrolle',\n", " 'türdichtung',\n", " 'türgriff',\n", " 'türsicherung',\n", " 'umlenkwalzen',\n", " 'umrandung',\n", " 'unkraut',\n", " 'uschienenführung',\n", " 'uvv',\n", " 'ventil',\n", " 'verbaut',\n", " 'verbrennungsset',\n", " 'vereinbarung',\n", " 'verkalkung',\n", " 'verschleiteileinsatz',\n", " 'verschmutzung',\n", " 'verschmutzungenlos',\n", " 'verstellung',\n", " 'verunreinigung',\n", " 'vollständigkeit',\n", " 'volumenzähler',\n", " 'vorderer',\n", " 'vordruck',\n", " 'vorfilter',\n", " 'vorfilterflie',\n", " 'vorliegen',\n", " 'vormonat',\n", " 'wartungsintervall',\n", " 'wartungsvertrag',\n", " 'wasserfilter',\n", " 'wasserhärte',\n", " 'wasserpegelkontrolle',\n", " 'wasserzählerstand',\n", " 'wechselintervall',\n", " 'wärmetauscher',\n", " 'zahnrieme',\n", " 'zahnstange',\n", " 'zuleitung',\n", " 'zuschicken',\n", " 'ölfüllung',\n", " 'ölstand',\n", " 'ölstandsichtprüfung',\n", " 'ölstandskontrolle',\n", " 'überziehen'\n", "}" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "corrections: dict[str, str] = {\n", " 'desifektionsmittel': 'desinfektionsmittel',\n", " 'schweikopf': 'schweisskopf',\n", "}" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:base:Number of entries processed: 1, Percent completed: 0.01\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:base:Number of entries processed: 501, Percent completed: 7.42\n", "INFO:base:Number of entries processed: 1001, Percent completed: 14.82\n", "INFO:base:Number of entries processed: 1501, Percent completed: 22.23\n", "INFO:base:Number of entries processed: 2001, Percent completed: 29.63\n", "INFO:base:Number of entries processed: 2501, Percent completed: 37.04\n", "INFO:base:Number of entries processed: 3001, Percent completed: 44.44\n", "INFO:base:Number of entries processed: 3501, Percent completed: 51.84\n", "INFO:base:Number of entries processed: 4001, Percent completed: 59.25\n", "INFO:base:Number of entries processed: 4501, Percent completed: 66.65\n", "INFO:base:Number of entries processed: 5001, Percent completed: 74.06\n", "INFO:base:Number of entries processed: 5501, Percent completed: 81.46\n", "INFO:base:Number of entries processed: 6001, Percent completed: 88.86\n", "INFO:base:Number of entries processed: 6501, Percent completed: 96.27\n" ] } ], "source": [ "# adjacency matrix\n", "connections = dict()\n", "unique_tokens = set()\n", "UPDATE_STATUS = 500\n", "length_data = len(descr)\n", "spell_check_candidates = set()\n", "spell_checker = SpellChecker(language='de', distance=1)\n", "\n", "if not LOAD_CALC_FILES or IS_TEST:\n", " for count, description in enumerate(descr.iterrows()):\n", " \n", " text = description[1]['descr']\n", " weight = description[1]['num_occur']\n", " \n", " doc = nlp(text)\n", " \n", " obtain_descendant_info(\n", " doc=doc,\n", " weight=weight,\n", " POS_of_interest=POS_of_interest,\n", " TAG_of_interest=TAG_of_interest,\n", " connections=connections,\n", " unique_tokens=unique_tokens,\n", " spell_check_candidates=spell_check_candidates,\n", " spell_check_whitelist=spell_check_whitelist,\n", " spell_checker=spell_checker,\n", " corrections=corrections,\n", " )\n", " \n", " if count % UPDATE_STATUS == 0:\n", " logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "ADJ_DF_PATH = './Graphanalyse/adj_mat_df.fth'\n", "if not IS_TEST:\n", " if LOAD_CALC_FILES:\n", " adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n", " adj_mat_undir = adj_mat_undir.set_index('index')\n", " # additional information\n", " connections = load_pickle('connections.pkl')\n", " unique_tokens = load_pickle('unique_tokens.pkl')\n", " else:\n", " adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n", " adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n", " save_df = adj_mat_undir.reset_index()\n", " save_df.to_feather(ADJ_DF_PATH)\n", " # additional information\n", " save_pickle(obj=connections, path='connections.pkl')\n", " save_pickle(obj=unique_tokens, path='unique_tokens.pkl')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funktionsfähigRechterLasertellervorbereitenweitererAusbauTraversFunktionsbereitschaftumwandelnHechtanlage...FilterpumpeentwickelnPumpenstabHauptradeanlernenBegutachtungBetriebszeitWassereinbruchAntriebszahnradProstataproblem
-Austausch0000000000...0000000000
-Befestihgung0000000000...0000000000
-Bereich0000000000...0000000000
-Betonblock-0000000000...0000000000
-Bremskombination0000000000...0000000000
..................................................................
überziechen0000000000...0000000000
überziehen0000000000...0000000000
überziehn0000000000...0000000000
üblich0000000000...0000000000
üperprüfen0000000000...0000000000
\n", "

7046 rows × 7046 columns

\n", "
" ], "text/plain": [ " funktionsfähig Rechter Laserteller vorbereiten \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " weiterer Ausbau Travers Funktionsbereitschaft \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " umwandeln Hechtanlage ... Filterpumpe entwickeln \\\n", "-Austausch 0 0 ... 0 0 \n", "-Befestihgung 0 0 ... 0 0 \n", "-Bereich 0 0 ... 0 0 \n", "-Betonblock- 0 0 ... 0 0 \n", "-Bremskombination 0 0 ... 0 0 \n", "... ... ... ... ... ... \n", "überziechen 0 0 ... 0 0 \n", "überziehen 0 0 ... 0 0 \n", "überziehn 0 0 ... 0 0 \n", "üblich 0 0 ... 0 0 \n", "üperprüfen 0 0 ... 0 0 \n", "\n", " Pumpenstab Hauptrade anlernen Begutachtung \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " Betriebszeit Wassereinbruch Antriebszahnrad \\\n", "-Austausch 0 0 0 \n", "-Befestihgung 0 0 0 \n", "-Bereich 0 0 0 \n", "-Betonblock- 0 0 0 \n", "-Bremskombination 0 0 0 \n", "... ... ... ... \n", "überziechen 0 0 0 \n", "überziehen 0 0 0 \n", "überziehn 0 0 0 \n", "üblich 0 0 0 \n", "üperprüfen 0 0 0 \n", "\n", " Prostataproblem \n", "-Austausch 0 \n", "-Befestihgung 0 \n", "-Bereich 0 \n", "-Betonblock- 0 \n", "-Bremskombination 0 \n", "... ... \n", "überziechen 0 \n", "überziehen 0 \n", "überziehn 0 \n", "üblich 0 \n", "üperprüfen 0 \n", "\n", "[7046 rows x 7046 columns]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adj_mat_undir.sort_index()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funktionsfähigRechterLasertellervorbereitenweitererAusbauTraversFunktionsbereitschaftumwandelnHechtanlage...FilterpumpeentwickelnPumpenstabHauptradeanlernenBegutachtungBetriebszeitWassereinbruchAntriebszahnradProstataproblem
-Austausch0000000000...0000000000
-Befestihgung0000000000...0000000000
-Bereich0000000000...0000000000
-Betonblock-0000000000...0000000000
-Bremskombination0000000000...0000000000
..................................................................
überziechen0000000000...0000000000
überziehen0000000000...0000000000
überziehn0000000000...0000000000
üblich0000000000...0000000000
üperprüfen0000000000...0000000000
\n", "

7046 rows × 7046 columns

\n", "
" ], "text/plain": [ " funktionsfähig Rechter Laserteller vorbereiten \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " weiterer Ausbau Travers Funktionsbereitschaft \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " umwandeln Hechtanlage ... Filterpumpe entwickeln \\\n", "-Austausch 0 0 ... 0 0 \n", "-Befestihgung 0 0 ... 0 0 \n", "-Bereich 0 0 ... 0 0 \n", "-Betonblock- 0 0 ... 0 0 \n", "-Bremskombination 0 0 ... 0 0 \n", "... ... ... ... ... ... \n", "überziechen 0 0 ... 0 0 \n", "überziehen 0 0 ... 0 0 \n", "überziehn 0 0 ... 0 0 \n", "üblich 0 0 ... 0 0 \n", "üperprüfen 0 0 ... 0 0 \n", "\n", " Pumpenstab Hauptrade anlernen Begutachtung \\\n", "-Austausch 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Bereich 0 0 0 0 \n", "-Betonblock- 0 0 0 0 \n", "-Bremskombination 0 0 0 0 \n", "... ... ... ... ... \n", "überziechen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "überziehn 0 0 0 0 \n", "üblich 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " Betriebszeit Wassereinbruch Antriebszahnrad \\\n", "-Austausch 0 0 0 \n", "-Befestihgung 0 0 0 \n", "-Bereich 0 0 0 \n", "-Betonblock- 0 0 0 \n", "-Bremskombination 0 0 0 \n", "... ... ... ... \n", "überziechen 0 0 0 \n", "überziehen 0 0 0 \n", "überziehn 0 0 0 \n", "üblich 0 0 0 \n", "üperprüfen 0 0 0 \n", "\n", " Prostataproblem \n", "-Austausch 0 \n", "-Befestihgung 0 \n", "-Bereich 0 \n", "-Betonblock- 0 \n", "-Bremskombination 0 \n", "... ... \n", "überziechen 0 \n", "überziehen 0 \n", "überziehn 0 \n", "üblich 0 \n", "üperprüfen 0 \n", "\n", "[7046 rows x 7046 columns]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adj_mat_undir.sort_index()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
koennenWeiterleitungBrandeinGeräteinneresSchmerzMonatKontrawalzenbelagFunktionstestKesselwasser...NiveauSprinkleranlageAbdeckglasStoptastORingausblasenabsprechenArtikelnummerFehlersichtungbrannen
koennen0000000000...0000000000
Weiterleitung0000000000...0000000000
Brand0000000000...0000000000
ein0000000000...0000000000
Geräteinneres0000000000...0000000000
..................................................................
ausblasen0000000000...0000000000
absprechen0000000000...0000000000
Artikelnummer0000000000...0000000000
Fehlersichtung0000000000...0000000000
brannen0000000000...0000000000
\n", "

6959 rows × 6959 columns

\n", "
" ], "text/plain": [ " koennen Weiterleitung Brand ein Geräteinneres Schmerz \\\n", "koennen 0 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 0 \n", "Brand 0 0 0 0 0 0 \n", "ein 0 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 0 \n", "... ... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 0 \n", "brannen 0 0 0 0 0 0 \n", "\n", " Monat Kontrawalzenbelag Funktionstest Kesselwasser ... \\\n", "koennen 0 0 0 0 ... \n", "Weiterleitung 0 0 0 0 ... \n", "Brand 0 0 0 0 ... \n", "ein 0 0 0 0 ... \n", "Geräteinneres 0 0 0 0 ... \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 ... \n", "absprechen 0 0 0 0 ... \n", "Artikelnummer 0 0 0 0 ... \n", "Fehlersichtung 0 0 0 0 ... \n", "brannen 0 0 0 0 ... \n", "\n", " Niveau Sprinkleranlage Abdeckglas Stoptast ORing \\\n", "koennen 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 \n", "Brand 0 0 0 0 0 \n", "ein 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 \n", "brannen 0 0 0 0 0 \n", "\n", " ausblasen absprechen Artikelnummer Fehlersichtung brannen \n", "koennen 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 \n", "Brand 0 0 0 0 0 \n", "ein 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 \n", "brannen 0 0 0 0 0 \n", "\n", "[6959 rows x 6959 columns]" ] }, "execution_count": 776, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adj_mat_undir" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24490" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "92882" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.max(arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "195" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uni_arr = np.unique(arr)\n", "len(uni_arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Threshold" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "WEIGHT_THRESHOLD = 50" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "387" ] }, "execution_count": 788, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "177" ] }, "execution_count": 789, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp = np.sum(arr, axis=0)\n", "np.count_nonzero(temp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "thresh_adj_mat = adj_mat_undir.copy()\n", "thresh_adj_mat.loc[:] = arr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
koennenWeiterleitungBrandeinGeräteinneresSchmerzMonatKontrawalzenbelagFunktionstestKesselwasser...NiveauSprinkleranlageAbdeckglasStoptastORingausblasenabsprechenArtikelnummerFehlersichtungbrannen
koennen0000000000...0000000000
Weiterleitung0000000000...0000000000
Brand0000000000...0000000000
ein0000000000...0000000000
Geräteinneres0000000000...0000000000
..................................................................
ausblasen0000000000...0000000000
absprechen0000000000...0000000000
Artikelnummer0000000000...0000000000
Fehlersichtung0000000000...0000000000
brannen0000000000...0000000000
\n", "

6959 rows × 6959 columns

\n", "
" ], "text/plain": [ " koennen Weiterleitung Brand ein Geräteinneres Schmerz \\\n", "koennen 0 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 0 \n", "Brand 0 0 0 0 0 0 \n", "ein 0 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 0 \n", "... ... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 0 \n", "brannen 0 0 0 0 0 0 \n", "\n", " Monat Kontrawalzenbelag Funktionstest Kesselwasser ... \\\n", "koennen 0 0 0 0 ... \n", "Weiterleitung 0 0 0 0 ... \n", "Brand 0 0 0 0 ... \n", "ein 0 0 0 0 ... \n", "Geräteinneres 0 0 0 0 ... \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 ... \n", "absprechen 0 0 0 0 ... \n", "Artikelnummer 0 0 0 0 ... \n", "Fehlersichtung 0 0 0 0 ... \n", "brannen 0 0 0 0 ... \n", "\n", " Niveau Sprinkleranlage Abdeckglas Stoptast ORing \\\n", "koennen 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 \n", "Brand 0 0 0 0 0 \n", "ein 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 \n", "brannen 0 0 0 0 0 \n", "\n", " ausblasen absprechen Artikelnummer Fehlersichtung brannen \n", "koennen 0 0 0 0 0 \n", "Weiterleitung 0 0 0 0 0 \n", "Brand 0 0 0 0 0 \n", "ein 0 0 0 0 0 \n", "Geräteinneres 0 0 0 0 0 \n", "... ... ... ... ... ... \n", "ausblasen 0 0 0 0 0 \n", "absprechen 0 0 0 0 0 \n", "Artikelnummer 0 0 0 0 0 \n", "Fehlersichtung 0 0 0 0 0 \n", "brannen 0 0 0 0 0 \n", "\n", "[6959 rows x 6959 columns]" ] }, "execution_count": 791, "metadata": {}, "output_type": "execute_result" } ], "source": [ "thresh_adj_mat" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{WEIGHT_THRESHOLD}.csv'\n", "thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "***Testing***" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "important_words = []\n", "all_entities = []\n", "pos_tags = set()\n", "pos_counter = dict()\n", "token_counter = 0\n", "\n", "for description in descr:\n", " doc = nlp(description)\n", " \n", " relevant_words = []\n", " for token in doc:\n", " POS = token.pos_\n", " token_counter += 1\n", " if POS in pos_counter:\n", " pos_counter[POS] += 1\n", " else:\n", " pos_counter[POS] = 1\n", " \n", " if (not token.is_stop and not token.is_punct and \n", " not token.is_space and (POS == 'NOUN' or \n", " POS == 'PROPN' or \n", " POS == 'ADJ' or \n", " POS == 'ADV')):\n", " relevant_words.append((token.lemma_.lower(), POS))\n", " #pos_tags.add(token.pos_)\n", " \n", " entities = [] \n", " for ent in doc.ents:\n", " entities.append((ent.text, ent.label_))\n", " \n", " important_words.extend(relevant_words)\n", " all_entities.extend(entities)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('täglich', 'ADJ'),\n", " ('wartungstätigkeit', 'NOUN'),\n", " ('vorgabe', 'NOUN'),\n", " ('maschinenhersteller', 'NOUN'),\n", " ('wöchentliche', 'ADJ'),\n", " ('sichtkontrolle', 'NOUN'),\n", " ('reinigung', 'NOUN'),\n", " ('täglich', 'ADJ'),\n", " ('überprüfung', 'NOUN'),\n", " ('ölabscheider', 'NOUN'),\n", " ('wöchentlich', 'ADJ'),\n", " ('kontrolle', 'NOUN'),\n", " ('wc-anlage', 'NOUN'),\n", " ('halbjährliche', 'ADJ'),\n", " ('kontrolle', 'NOUN'),\n", " ('stabbreithalter', 'NOUN'),\n", " ('brandschutztechnische', 'ADJ'),\n", " ('prüfung', 'NOUN'),\n", " ('prüfung', 'NOUN'),\n", " ('scharniere', 'NOUN'),\n", " ('dichtung', 'NOUN'),\n", " ('schließvorrichtung', 'NOUN'),\n", " ('schloß', 'NOUN'),\n", " ('beschlag', 'NOUN'),\n", " ('allgemein', 'ADJ'),\n", " ('funktion', 'NOUN'),\n", " ('schmierung', 'NOUN'),\n", " ('festhaltevorrichtung', 'NOUN'),\n", " ('täglich', 'ADJ'),\n", " ('technikrundgang', 'NOUN'),\n", " ('monatliche', 'ADJ'),\n", " ('sichtkontrolle', 'NOUN'),\n", " ('monatliche', 'ADJ'),\n", " ('prüfung', 'NOUN'),\n", " ('scharniere', 'NOUN'),\n", " ('dichtung', 'NOUN'),\n", " ('schließvorrichtung', 'NOUN'),\n", " ('schloß', 'NOUN'),\n", " ('beschlag', 'NOUN'),\n", " ('allgemein', 'ADJ'),\n", " ('funktion', 'NOUN'),\n", " ('schmierung', 'NOUN'),\n", " ('festhaltevorrichtung', 'NOUN')]" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "important_words" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "43" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(important_words)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 223, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_entities" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "count = Counter(important_words)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({('täglich', 'ADJ'): 3,\n", " ('prüfung', 'NOUN'): 3,\n", " ('sichtkontrolle', 'NOUN'): 2,\n", " ('kontrolle', 'NOUN'): 2,\n", " ('scharniere', 'NOUN'): 2,\n", " ('dichtung', 'NOUN'): 2,\n", " ('schließvorrichtung', 'NOUN'): 2,\n", " ('schloß', 'NOUN'): 2,\n", " ('beschlag', 'NOUN'): 2,\n", " ('allgemein', 'ADJ'): 2,\n", " ('funktion', 'NOUN'): 2,\n", " ('schmierung', 'NOUN'): 2,\n", " ('festhaltevorrichtung', 'NOUN'): 2,\n", " ('monatliche', 'ADJ'): 2,\n", " ('wartungstätigkeit', 'NOUN'): 1,\n", " ('vorgabe', 'NOUN'): 1,\n", " ('maschinenhersteller', 'NOUN'): 1,\n", " ('wöchentliche', 'ADJ'): 1,\n", " ('reinigung', 'NOUN'): 1,\n", " ('überprüfung', 'NOUN'): 1,\n", " ('ölabscheider', 'NOUN'): 1,\n", " ('wöchentlich', 'ADJ'): 1,\n", " ('wc-anlage', 'NOUN'): 1,\n", " ('halbjährliche', 'ADJ'): 1,\n", " ('stabbreithalter', 'NOUN'): 1,\n", " ('brandschutztechnische', 'ADJ'): 1,\n", " ('technikrundgang', 'NOUN'): 1})" ] }, "execution_count": 225, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NOUN 25722\n", "PUNCT 11626\n", "VERB 9093\n", "ADP 7211\n", "ADV 6526\n", "PROPN 4481\n", "NUM 4115\n", "DET 3845\n", "ADJ 2576\n", "AUX 2329\n", "PART 1561\n", "CCONJ 1305\n", "X 999\n", "PRON 916\n", "SCONJ 385\n", "SPACE 236\n", "INTJ 1\n", "dtype: int64" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pos_count = pd.Series(data=pos_counter)\n", "pos_count.sort_values(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NOUN 0.310176\n", "PUNCT 0.140196\n", "VERB 0.109651\n", "ADP 0.086956\n", "ADV 0.078696\n", "PROPN 0.054035\n", "NUM 0.049622\n", "DET 0.046366\n", "ADJ 0.031063\n", "AUX 0.028085\n", "PART 0.018824\n", "CCONJ 0.015737\n", "X 0.012047\n", "PRON 0.011046\n", "SCONJ 0.004643\n", "SPACE 0.002846\n", "INTJ 0.000012\n", "dtype: float64" ] }, "execution_count": 184, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pos_count_rel = pos_count / pos_count.sum()\n", "pos_count_rel.sort_values(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "82927" ] }, "execution_count": 181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "token_counter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Weiterführende Analyse der Beschreibungen\n", "\n", "- unklare Zusammenhänge der 1200er-Threshold-Ergebnisse präzisieren:\n", " - Finden der entsprechenden Beschreibungen\n", " - Kontextualisieren\n", "- Identifikation von weiteren Blacklistworten" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unklare Zusammenhänge 1200er-Threshold" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
index
161Tägliche Wartungstätigkeiten nach Vorgabe des ...6692592[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...206
33Wöchentliche Sichtkontrolle Reinigung371654[301, 304, 305, 313, 314, 331, 332, 510, 511, ...18
130Tägliche Überprüfung der Ölabscheider371616[0, 970, 2134, 2137]4
159Wöchentliche Kontrolle der WC-Anlagen371265[1352, 1353, 1354, 1684, 1685, 1686, 1687, 168...11
139Halbjährliche Kontrolle des Stabbreithalters44687[51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6...166
..................
2675Stand 15.07.2020 Stöppel: Herr Langner Toyota ...2531[311]1
2674Zahnräder der Laufkatze verschlissen Ersatztei...1671[415]1
2673Bitte 8 Scheiben nach Muster anfertigen. Danke.471[140]1
2672Schalter für Bühne Schwenken abgerissen, bitte...1231[323]1
6781Befestigung Deckel für Batteriefach defekt Hal...991[326]1
\n", "

6782 rows × 5 columns

\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "index \n", "161 Tägliche Wartungstätigkeiten nach Vorgabe des ... 66 92592 \n", "33 Wöchentliche Sichtkontrolle Reinigung 37 1654 \n", "130 Tägliche Überprüfung der Ölabscheider 37 1616 \n", "159 Wöchentliche Kontrolle der WC-Anlagen 37 1265 \n", "139 Halbjährliche Kontrolle des Stabbreithalters 44 687 \n", "... ... ... ... \n", "2675 Stand 15.07.2020 Stöppel: Herr Langner Toyota ... 253 1 \n", "2674 Zahnräder der Laufkatze verschlissen Ersatztei... 167 1 \n", "2673 Bitte 8 Scheiben nach Muster anfertigen. Danke. 47 1 \n", "2672 Schalter für Bühne Schwenken abgerissen, bitte... 123 1 \n", "6781 Befestigung Deckel für Batteriefach defekt Hal... 99 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "index \n", "161 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n", "33 [301, 304, 305, 313, 314, 331, 332, 510, 511, ... 18 \n", "130 [0, 970, 2134, 2137] 4 \n", "159 [1352, 1353, 1354, 1684, 1685, 1686, 1687, 168... 11 \n", "139 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6... 166 \n", "... ... ... \n", "2675 [311] 1 \n", "2674 [415] 1 \n", "2673 [140] 1 \n", "2672 [323] 1 \n", "6781 [326] 1 \n", "\n", "[6782 rows x 5 columns]" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "temp2 = temp1.loc[temp1['num_occur'] >= 3, :]\n", "temp2 = temp1.copy()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#temp2 = temp2.iloc[:30,:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "check_words = set(['E1.8'])\n", "target_indices = list()\n", "\n", "for idx, row in temp2.iterrows():\n", " \n", " text = row['descr']\n", " doc = nlp(text)\n", " \n", " token_set = set()\n", " target_idx = None\n", " for token in doc:\n", " \n", " if not (token.pos_ in POS_of_interest or token.tag_ in TAG_of_interest):\n", " continue\n", " \n", " token_set.add(token.lemma_.lower())\n", " #print(f'{token_set=}')\n", "\n", " if token_set.issuperset(check_words):\n", " target_indices.append(idx)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "target_indices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Vorgaben aus Pleva Wartungsplan Schmieren der Rollenlager der beiden Kameralaufschlitten des Strukturdetektors SD 1C siehe Extradaten'" ] }, "execution_count": 506, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = target_indices[3]\n", "temp2.at[idx, 'descr']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Leiterprüfung derzeit in Arbeit Abteilungsleiter sind per Email am 11.06.2019 über deren Eigenverantwortlichkeit und Mithilfe durch Herr Graf informiert worden.'" ] }, "execution_count": 229, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp2.at[1921,'descr']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "token_set.issuperset(check_words)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'ADJD'}" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "POS_of_interest\n", "TAG_of_interest" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = 'Tägliche, tägliche Wartungstätigkeit des Maschinenherstellers Maschine'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "doc = nlp(test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "täglich\n", "--\n", "täglich\n", "wartungstätigkeit\n", "der\n", "maschinenhersteller\n", "maschine\n" ] } ], "source": [ "for token in doc:\n", " print(token.lemma_.lower())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "replace_chars = [',', '\\n', '\\t', '\\s']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = test.lower()\n", "for char in replace_chars:\n", " test = test.replace(char, '')\n", "test = test.split()\n", "test = set(test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'des', 'maschine', 'maschinenherstellers', 'tägliche', 'wartungstätigkeit'}" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test.issuperset(check_words)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Zwischenergebnisse:**\n", "\n", "*bestimmte ObjektIDs haben den Escape-Charakter, andere nicht: keine ObjektID mit beiden Varianten*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl der Duplikate = 47689 für Beschreibung mit Index-Nr. 171:\n", " Tägliche Wartungstätigkeiten nach Vorgabe des Maschinenherstellers\n", "\n" ] } ], "source": [ "print(f\"Anzahl der Duplikate = {max_val} für Beschreibung mit Index-Nr. {index}:\\n {text}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Merkmal 2: VorgangsArtText" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "feature = 'VorgangsArtText'" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "base = wo_duplicates.copy()\n", "base = base.dropna(axis=0, subset=feature)\n", "base[feature] = base[feature].map(clean_string)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
011114427 C , Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-0640NaNNaNKettbaum kaputt2019-03-06NaNNaNWebereiWebereiNaT2019-03-06
117124621 C , Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-1150NaNNaNasgasdg2019-03-11NaNNaNElektrowerkstattElektrowerkstattNaT2019-03-11
253244285 C, Webmaschine, SG 220 EMS5Greifer-Webmaschine3Reparaturauftrag (Portal)2019-03-1950Kupplung schleiftNaNKupplung defekt2019-03-20Reparatur UTTNaNWebereiWebereiNaT2019-03-19
358257107, Webmaschine, OM 220 EOS3Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-2150Gegengewicht wieder anbringenNaNGegengewicht an der Webmaschine abgefallen2019-03-21Reparatur UTTSchraube ausgebohrt\\nGegengewicht wieder angeb...WebereiWeberei2019-03-212019-03-21
48113800138, Schärmaschine 9,16Schärmaschine3Reparaturauftrag (Portal)2019-03-2550da ist etwas gebrochen. (Herr Heininger)NaNzentrale Bremsenverstellung linke Gatterseite ...2019-03-25Reparatur UTTBolzen gebrochen. Bolzen neu angefertig und di...VorwerkVorwerk2019-03-252019-03-25
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText \\\n", "0 11 114 427 C , Webmaschine, DL 280 EMS Breite 280 \n", "1 17 124 621 C , Webmaschine, DL 280 EMS Breite 280 \n", "2 53 244 285 C, Webmaschine, SG 220 EMS \n", "3 58 257 107, Webmaschine, OM 220 EOS \n", "4 81 138 00138, Schärmaschine 9, \n", "\n", " ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n", "0 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "1 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "2 5 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n", "3 3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "4 16 Schärmaschine 3 Reparaturauftrag (Portal) \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "0 2019-03-06 4 0 \n", "1 2019-03-11 5 0 \n", "2 2019-03-19 5 0 \n", "3 2019-03-21 5 0 \n", "4 2019-03-25 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 Kupplung schleift NaN \n", "3 Gegengewicht wieder anbringen NaN \n", "4 da ist etwas gebrochen. (Herr Heininger) NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "0 Kettbaum kaputt 2019-03-06 \n", "1 asgasdg 2019-03-11 \n", "2 Kupplung defekt 2019-03-20 \n", "3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n", "4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n", "\n", " ErledigungsArtText ErledigungsBeschreibung \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 Reparatur UTT NaN \n", "3 Reparatur UTT Schraube ausgebohrt\\nGegengewicht wieder angeb... \n", "4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n", "\n", " MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "0 Weberei Weberei NaT 2019-03-06 \n", "1 Elektrowerkstatt Elektrowerkstatt NaT 2019-03-11 \n", "2 Weberei Weberei NaT 2019-03-19 \n", "3 Weberei Weberei 2019-03-21 2019-03-21 \n", "4 Vorwerk Vorwerk 2019-03-25 2019-03-25 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base.head()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Einträge: 128936\n" ] } ], "source": [ "descriptions = base[feature]\n", "print(f\"Einträge: {len(descriptions)}\")" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Duplikate VorgangsArtText: 128545\n", "Anzahl einzigartiger VorgangsArtText: 391\n", "Anteil einzigartiger VorgangsArtText: 0.30 %\n" ] } ], "source": [ "num_dupl_descr = descriptions.duplicated().sum()\n", "uni_descr = descriptions.unique()\n", "num_uni_descr = len(uni_descr)\n", "\n", "print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n", "print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n", "print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "if not LOAD_CALC_FILES:\n", " cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n", " descr_df = pd.DataFrame(columns=cols)\n", " max_val = 0\n", " text = None\n", " index = 0\n", "\n", "\n", " for idx, description in enumerate(uni_descr):\n", " len_descr = len(description)\n", " filt = base[feature] == description\n", " temp = base[filt]\n", " assoc_obj_ids = temp['ObjektID'].unique()\n", " assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n", " num_assoc_obj_ids = len(assoc_obj_ids)\n", " num_dupl = filt.sum()\n", " \n", " conc_df = pd.DataFrame(data=[[\n", " description,\n", " len_descr,\n", " num_dupl,\n", " assoc_obj_ids,\n", " num_assoc_obj_ids\n", " ]], columns=cols)\n", " \n", " descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n", " \n", " if num_dupl > max_val:\n", " max_val = num_dupl\n", " index = idx\n", " text = description\n", " \n", " temp1 = descr_df.sort_values(by='num_occur', ascending=False)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
60Tägliche Interne Wartungstätigkeiten Weberei4492719[0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53...206
1001 Interne Reinigung Pflege Überprüfung3911250[0, 7, 425, 426, 427, 428, 429, 517, 518, 576,...349
2802 Interne Reinigung Pflege Überprüfung393263[576, 906, 910, 940, 941, 942, 943, 1040, 1041...52
29Maschinen-Wartung wöchentlich292408[1, 301, 305, 313, 314, 331, 332, 510, 511, 51...25
46Gesetzliche Wartung Prüfung jährlich362403[0, 191, 193, 195, 197, 200, 287, 288, 289, 29...638
..................
222Walze WK 03 Umlenkwalze zapfen301[1]1
224Leiter Nr. 90 und überprüfen281[1]1
225Locht nicht mehr161[338]1
226Maschine stellt immer wieder ab311[338]1
390Gesetzliche Wartung Prüfung Anlagenprüfung Dru...561[547]1
\n", "

391 rows × 5 columns

\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "60 Tägliche Interne Wartungstätigkeiten Weberei 44 92719 \n", "10 01 Interne Reinigung Pflege Überprüfung 39 11250 \n", "28 02 Interne Reinigung Pflege Überprüfung 39 3263 \n", "29 Maschinen-Wartung wöchentlich 29 2408 \n", "46 Gesetzliche Wartung Prüfung jährlich 36 2403 \n", ".. ... .. ... \n", "222 Walze WK 03 Umlenkwalze zapfen 30 1 \n", "224 Leiter Nr. 90 und überprüfen 28 1 \n", "225 Locht nicht mehr 16 1 \n", "226 Maschine stellt immer wieder ab 31 1 \n", "390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 56 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "60 [0, 17, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53... 206 \n", "10 [0, 7, 425, 426, 427, 428, 429, 517, 518, 576,... 349 \n", "28 [576, 906, 910, 940, 941, 942, 943, 1040, 1041... 52 \n", "29 [1, 301, 305, 313, 314, 331, 332, 510, 511, 51... 25 \n", "46 [0, 191, 193, 195, 197, 200, 287, 288, 289, 29... 638 \n", ".. ... ... \n", "222 [1] 1 \n", "224 [1] 1 \n", "225 [338] 1 \n", "226 [338] 1 \n", "390 [547] 1 \n", "\n", "[391 rows x 5 columns]" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "# save/load dataframe\n", "FILE_PATH = f'{feature}_analyse_1.fth'\n", "if LOAD_CALC_FILES:\n", " temp1 = pd.read_feather(FILE_PATH)\n", " temp1 = temp1.set_index('index')\n", "else:\n", " save_df = temp1.reset_index()\n", " save_df.to_feather(FILE_PATH)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Gesamter Datensatz" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "# analysiere erste 10 Einträge\n", "descr = temp1[['descr', 'num_occur']]\n", "#descr = descr.iloc[50:200,:]" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "391" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(descr)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrnum_occur
60Tägliche Interne Wartungstätigkeiten Weberei92719
1001 Interne Reinigung Pflege Überprüfung11250
2802 Interne Reinigung Pflege Überprüfung3263
29Maschinen-Wartung wöchentlich2408
46Gesetzliche Wartung Prüfung jährlich2403
.........
222Walze WK 03 Umlenkwalze zapfen1
224Leiter Nr. 90 und überprüfen1
225Locht nicht mehr1
226Maschine stellt immer wieder ab1
390Gesetzliche Wartung Prüfung Anlagenprüfung Dru...1
\n", "

391 rows × 2 columns

\n", "
" ], "text/plain": [ " descr num_occur\n", "60 Tägliche Interne Wartungstätigkeiten Weberei 92719\n", "10 01 Interne Reinigung Pflege Überprüfung 11250\n", "28 02 Interne Reinigung Pflege Überprüfung 3263\n", "29 Maschinen-Wartung wöchentlich 2408\n", "46 Gesetzliche Wartung Prüfung jährlich 2403\n", ".. ... ...\n", "222 Walze WK 03 Umlenkwalze zapfen 1\n", "224 Leiter Nr. 90 und überprüfen 1\n", "225 Locht nicht mehr 1\n", "226 Maschine stellt immer wieder ab 1\n", "390 Gesetzliche Wartung Prüfung Anlagenprüfung Dru... 1\n", "\n", "[391 rows x 2 columns]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descr" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "#LOAD_CALC_FILES = True\n", "#LOAD_CALC_FILES = False\n", "#IS_TEST = True\n", "IS_TEST = False" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:base:Number of entries processed: 1, Percent completed: 0.26\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:base:Number of entries processed: 101, Percent completed: 25.83\n", "INFO:base:Number of entries processed: 201, Percent completed: 51.41\n", "INFO:base:Number of entries processed: 301, Percent completed: 76.98\n" ] } ], "source": [ "# adjacency matrix\n", "connections = dict()\n", "unique_tokens = set()\n", "UPDATE_STATUS = 100\n", "length_data = len(descr)\n", "spell_check_candidates = set()\n", "spell_checker = SpellChecker(language='de', distance=1)\n", "\n", "if not LOAD_CALC_FILES or IS_TEST:\n", " for count, description in enumerate(descr.iterrows()):\n", " \n", " text = description[1]['descr']\n", " weight = description[1]['num_occur']\n", " \n", " doc = nlp(text)\n", " \n", " obtain_descendant_info(\n", " doc=doc,\n", " weight=weight,\n", " POS_of_interest=POS_of_interest,\n", " TAG_of_interest=TAG_of_interest,\n", " connections=connections,\n", " unique_tokens=unique_tokens,\n", " spell_check_candidates=spell_check_candidates,\n", " spell_check_whitelist=spell_check_whitelist,\n", " spell_checker=spell_checker,\n", " corrections=corrections,\n", " )\n", " \n", " if count % UPDATE_STATUS == 0:\n", " logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n", "if not IS_TEST:\n", " if LOAD_CALC_FILES:\n", " adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n", " adj_mat_undir = adj_mat_undir.set_index('index')\n", " # additional information\n", " connections = load_pickle('connections.pkl')\n", " unique_tokens = load_pickle('unique_tokens.pkl')\n", " else:\n", " adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n", " adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n", " save_df = adj_mat_undir.reset_index()\n", " save_df.to_feather(ADJ_DF_PATH)\n", " # additional information\n", " save_pickle(obj=connections, path='connections.pkl')\n", " save_pickle(obj=unique_tokens, path='unique_tokens.pkl')" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
leckenWCLKWoffenMaschinen-ReinigungDockenwicklerhalb-jährlichTischzentralanbringen...undicht-PlatineerneuernVerschmutzungbefestigenwechselnLaborWalzeanfahrenLeiter
12-monatige-Inspektion0000000000...0000000000
2-monatlich0000000000...0000000000
2-wöchentlich0000000000...0000000000
24-monatige-Inspektion0000000000...0000000000
3-jährlich0000000000...0000000000
..................................................................
Ölwechsel0000000000...0000000000
Überprüfung0000000000...0000000000
äußerer0000000000...0000000000
überprüfen0000000000...0000000001
überziehen0000000000...0000000100
\n", "

390 rows × 390 columns

\n", "
" ], "text/plain": [ " lecken WC LKW offen Maschinen-Reinigung \\\n", "12-monatige-Inspektion 0 0 0 0 0 \n", "2-monatlich 0 0 0 0 0 \n", "2-wöchentlich 0 0 0 0 0 \n", "24-monatige-Inspektion 0 0 0 0 0 \n", "3-jährlich 0 0 0 0 0 \n", "... ... .. ... ... ... \n", "Ölwechsel 0 0 0 0 0 \n", "Überprüfung 0 0 0 0 0 \n", "äußerer 0 0 0 0 0 \n", "überprüfen 0 0 0 0 0 \n", "überziehen 0 0 0 0 0 \n", "\n", " Dockenwickler halb-jährlich Tisch zentral \\\n", "12-monatige-Inspektion 0 0 0 0 \n", "2-monatlich 0 0 0 0 \n", "2-wöchentlich 0 0 0 0 \n", "24-monatige-Inspektion 0 0 0 0 \n", "3-jährlich 0 0 0 0 \n", "... ... ... ... ... \n", "Ölwechsel 0 0 0 0 \n", "Überprüfung 0 0 0 0 \n", "äußerer 0 0 0 0 \n", "überprüfen 0 0 0 0 \n", "überziehen 0 0 0 0 \n", "\n", " anbringen ... undicht- Platine erneuern \\\n", "12-monatige-Inspektion 0 ... 0 0 0 \n", "2-monatlich 0 ... 0 0 0 \n", "2-wöchentlich 0 ... 0 0 0 \n", "24-monatige-Inspektion 0 ... 0 0 0 \n", "3-jährlich 0 ... 0 0 0 \n", "... ... ... ... ... ... \n", "Ölwechsel 0 ... 0 0 0 \n", "Überprüfung 0 ... 0 0 0 \n", "äußerer 0 ... 0 0 0 \n", "überprüfen 0 ... 0 0 0 \n", "überziehen 0 ... 0 0 0 \n", "\n", " Verschmutzung befestigen wechseln Labor Walze \\\n", "12-monatige-Inspektion 0 0 0 0 0 \n", "2-monatlich 0 0 0 0 0 \n", "2-wöchentlich 0 0 0 0 0 \n", "24-monatige-Inspektion 0 0 0 0 0 \n", "3-jährlich 0 0 0 0 0 \n", "... ... ... ... ... ... \n", "Ölwechsel 0 0 0 0 0 \n", "Überprüfung 0 0 0 0 0 \n", "äußerer 0 0 0 0 0 \n", "überprüfen 0 0 0 0 0 \n", "überziehen 0 0 0 0 1 \n", "\n", " anfahren Leiter \n", "12-monatige-Inspektion 0 0 \n", "2-monatlich 0 0 \n", "2-wöchentlich 0 0 \n", "24-monatige-Inspektion 0 0 \n", "3-jährlich 0 0 \n", "... ... ... \n", "Ölwechsel 0 0 \n", "Überprüfung 0 0 \n", "äußerer 0 0 \n", "überprüfen 0 1 \n", "überziehen 0 0 \n", "\n", "[390 rows x 390 columns]" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adj_mat_undir.sort_index()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "391" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "92964" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.max(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Threshold" ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [], "source": [ "WEIGHT_THRESHOLD = 0" ] }, { "cell_type": "code", "execution_count": 163, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [], "source": [ "arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)" ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "391" ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": 166, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "233" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp = np.sum(arr, axis=0)\n", "np.count_nonzero(temp)" ] }, { "cell_type": "code", "execution_count": 167, "metadata": {}, "outputs": [], "source": [ "thresh_adj_mat = adj_mat_undir.copy()\n", "thresh_adj_mat.loc[:] = arr" ] }, { "cell_type": "code", "execution_count": 168, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WasserleitungwechselnWinkelpositionsgeberKlimaanlagengerätversetzenBrennschlittenfeststellenStuhlmonatlichanfertigen...ZahnriemenRampeTischdefektElektrischehabenWasserenthärtungsanlageGestankZahnradhydraulisch
Wasserleitung0000000000...0000000000
wechseln0000000000...0000000000
Winkelpositionsgeber0000000000...0001000000
Klimaanlagengerät0000000000...0000000000
versetzen0000000000...0000000000
..................................................................
haben0000000000...0000000000
Wasserenthärtungsanlage0000000000...0000000000
Gestank0000000000...0000000000
Zahnrad0000000000...0000000000
hydraulisch0000000000...0000000000
\n", "

390 rows × 390 columns

\n", "
" ], "text/plain": [ " Wasserleitung wechseln Winkelpositionsgeber \\\n", "Wasserleitung 0 0 0 \n", "wechseln 0 0 0 \n", "Winkelpositionsgeber 0 0 0 \n", "Klimaanlagengerät 0 0 0 \n", "versetzen 0 0 0 \n", "... ... ... ... \n", "haben 0 0 0 \n", "Wasserenthärtungsanlage 0 0 0 \n", "Gestank 0 0 0 \n", "Zahnrad 0 0 0 \n", "hydraulisch 0 0 0 \n", "\n", " Klimaanlagengerät versetzen Brennschlitten \\\n", "Wasserleitung 0 0 0 \n", "wechseln 0 0 0 \n", "Winkelpositionsgeber 0 0 0 \n", "Klimaanlagengerät 0 0 0 \n", "versetzen 0 0 0 \n", "... ... ... ... \n", "haben 0 0 0 \n", "Wasserenthärtungsanlage 0 0 0 \n", "Gestank 0 0 0 \n", "Zahnrad 0 0 0 \n", "hydraulisch 0 0 0 \n", "\n", " feststellen Stuhl monatlich anfertigen ... \\\n", "Wasserleitung 0 0 0 0 ... \n", "wechseln 0 0 0 0 ... \n", "Winkelpositionsgeber 0 0 0 0 ... \n", "Klimaanlagengerät 0 0 0 0 ... \n", "versetzen 0 0 0 0 ... \n", "... ... ... ... ... ... \n", "haben 0 0 0 0 ... \n", "Wasserenthärtungsanlage 0 0 0 0 ... \n", "Gestank 0 0 0 0 ... \n", "Zahnrad 0 0 0 0 ... \n", "hydraulisch 0 0 0 0 ... \n", "\n", " Zahnriemen Rampe Tisch defekt Elektrische haben \\\n", "Wasserleitung 0 0 0 0 0 0 \n", "wechseln 0 0 0 0 0 0 \n", "Winkelpositionsgeber 0 0 0 1 0 0 \n", "Klimaanlagengerät 0 0 0 0 0 0 \n", "versetzen 0 0 0 0 0 0 \n", "... ... ... ... ... ... ... \n", "haben 0 0 0 0 0 0 \n", "Wasserenthärtungsanlage 0 0 0 0 0 0 \n", "Gestank 0 0 0 0 0 0 \n", "Zahnrad 0 0 0 0 0 0 \n", "hydraulisch 0 0 0 0 0 0 \n", "\n", " Wasserenthärtungsanlage Gestank Zahnrad \\\n", "Wasserleitung 0 0 0 \n", "wechseln 0 0 0 \n", "Winkelpositionsgeber 0 0 0 \n", "Klimaanlagengerät 0 0 0 \n", "versetzen 0 0 0 \n", "... ... ... ... \n", "haben 0 0 0 \n", "Wasserenthärtungsanlage 0 0 0 \n", "Gestank 0 0 0 \n", "Zahnrad 0 0 0 \n", "hydraulisch 0 0 0 \n", "\n", " hydraulisch \n", "Wasserleitung 0 \n", "wechseln 0 \n", "Winkelpositionsgeber 0 \n", "Klimaanlagengerät 0 \n", "versetzen 0 \n", "... ... \n", "haben 0 \n", "Wasserenthärtungsanlage 0 \n", "Gestank 0 \n", "Zahnrad 0 \n", "hydraulisch 0 \n", "\n", "[390 rows x 390 columns]" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "thresh_adj_mat" ] }, { "cell_type": "code", "execution_count": 169, "metadata": {}, "outputs": [], "source": [ "ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n", "thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Merkmal 3: ErledigungsBeschreibung" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "feature = 'ErledigungsBeschreibung'" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "base = wo_duplicates.copy()\n", "base = base.dropna(axis=0, subset=feature)\n", "base[feature] = base[feature].map(clean_string)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
358257107, Webmaschine, OM 220 EOS3Luft-Webmaschine3Reparaturauftrag (Portal)2019-03-2150Gegengewicht wieder anbringenNaNGegengewicht an der Webmaschine abgefallen2019-03-21Reparatur UTTSchraube ausgebohrt Gegengewicht wieder angebr...WebereiWeberei2019-03-212019-03-21
48113800138, Schärmaschine 9,16Schärmaschine3Reparaturauftrag (Portal)2019-03-2550da ist etwas gebrochen. (Herr Heininger)NaNzentrale Bremsenverstellung linke Gatterseite ...2019-03-25Reparatur UTTBolzen gebrochen. Bolzen neu angefertig und di...VorwerkVorwerk2019-03-252019-03-25
5820Warenschau allgemein0NaN3Reparaturauftrag (Portal)2019-03-2550Klappbügel Portalkran H31 defektWarenschau allgemeinAllgemeine Reparaturarbeiten2019-03-25Reparatur UTTFeder ausgetauschtWarenschauWarenschau2019-03-252019-03-25
6760Neben der Türe0NaN3Reparaturauftrag (Portal)2019-03-2250Schraube nix mer gutNeben der TüreKettbaum2019-03-25Reparatur UTTSchrauben ausgebohrt Gewinde nachgeschnittenVorwerkVorwerk2019-03-252019-03-22
8111241294 C, Webmaschine, SG 240 EMS5Greifer-Webmaschine3Reparaturauftrag (Portal)2019-04-0150KBK tauschen\\nUrsache vermutlich mechanischNaNKupplung-Brems-Kombination2019-04-08Reparatur UTTda derzeit Keine Ersatzteile da Reparatur mit ...WebereiWeberei2019-04-022019-04-01
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText ObjektArtID \\\n", "3 58 257 107, Webmaschine, OM 220 EOS 3 \n", "4 81 138 00138, Schärmaschine 9, 16 \n", "5 82 0 Warenschau allgemein 0 \n", "6 76 0 Neben der Türe 0 \n", "8 111 241 294 C, Webmaschine, SG 240 EMS 5 \n", "\n", " ObjektArtText VorgangsTypID VorgangsTypName \\\n", "3 Luft-Webmaschine 3 Reparaturauftrag (Portal) \n", "4 Schärmaschine 3 Reparaturauftrag (Portal) \n", "5 NaN 3 Reparaturauftrag (Portal) \n", "6 NaN 3 Reparaturauftrag (Portal) \n", "8 Greifer-Webmaschine 3 Reparaturauftrag (Portal) \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "3 2019-03-21 5 0 \n", "4 2019-03-25 5 0 \n", "5 2019-03-25 5 0 \n", "6 2019-03-22 5 0 \n", "8 2019-04-01 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "3 Gegengewicht wieder anbringen NaN \n", "4 da ist etwas gebrochen. (Herr Heininger) NaN \n", "5 Klappbügel Portalkran H31 defekt Warenschau allgemein \n", "6 Schraube nix mer gut Neben der Türe \n", "8 KBK tauschen\\nUrsache vermutlich mechanisch NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "3 Gegengewicht an der Webmaschine abgefallen 2019-03-21 \n", "4 zentrale Bremsenverstellung linke Gatterseite ... 2019-03-25 \n", "5 Allgemeine Reparaturarbeiten 2019-03-25 \n", "6 Kettbaum 2019-03-25 \n", "8 Kupplung-Brems-Kombination 2019-04-08 \n", "\n", " ErledigungsArtText ErledigungsBeschreibung \\\n", "3 Reparatur UTT Schraube ausgebohrt Gegengewicht wieder angebr... \n", "4 Reparatur UTT Bolzen gebrochen. Bolzen neu angefertig und di... \n", "5 Reparatur UTT Feder ausgetauscht \n", "6 Reparatur UTT Schrauben ausgebohrt Gewinde nachgeschnitten \n", "8 Reparatur UTT da derzeit Keine Ersatzteile da Reparatur mit ... \n", "\n", " MPMelderArbeitsplatz MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "3 Weberei Weberei 2019-03-21 2019-03-21 \n", "4 Vorwerk Vorwerk 2019-03-25 2019-03-25 \n", "5 Warenschau Warenschau 2019-03-25 2019-03-25 \n", "6 Vorwerk Vorwerk 2019-03-25 2019-03-22 \n", "8 Weberei Weberei 2019-04-02 2019-04-01 " ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base.head()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Einträge: 118086\n" ] } ], "source": [ "descriptions = base[feature]\n", "print(f\"Einträge: {len(descriptions)}\")" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Duplikate ErledigungsBeschreibung: 110707\n", "Anzahl einzigartiger ErledigungsBeschreibung: 7379\n", "Anteil einzigartiger ErledigungsBeschreibung: 6.25 %\n" ] } ], "source": [ "num_dupl_descr = descriptions.duplicated().sum()\n", "uni_descr = descriptions.unique()\n", "num_uni_descr = len(uni_descr)\n", "\n", "print(f\"Anzahl Duplikate {feature}: {num_dupl_descr}\")\n", "print(f\"Anzahl einzigartiger {feature}: {num_uni_descr}\")\n", "print(f\"Anteil einzigartiger {feature}: {num_uni_descr / len(descriptions) * 100:.2f} %\")" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LOAD_CALC_FILES" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "if not LOAD_CALC_FILES:\n", " cols = ['descr', 'len', 'num_occur', 'assoc_obj_ids', 'num_assoc_obj_ids']\n", " descr_df = pd.DataFrame(columns=cols)\n", " max_val = 0\n", " text = None\n", " index = 0\n", "\n", "\n", " for idx, description in enumerate(uni_descr):\n", " len_descr = len(description)\n", " filt = base[feature] == description\n", " temp = base[filt]\n", " assoc_obj_ids = temp['ObjektID'].unique()\n", " assoc_obj_ids = np.sort(assoc_obj_ids, kind='stable')\n", " num_assoc_obj_ids = len(assoc_obj_ids)\n", " num_dupl = filt.sum()\n", " \n", " conc_df = pd.DataFrame(data=[[\n", " description,\n", " len_descr,\n", " num_dupl,\n", " assoc_obj_ids,\n", " num_assoc_obj_ids\n", " ]], columns=cols)\n", " \n", " descr_df = pd.concat([descr_df, conc_df], ignore_index=True)\n", " \n", " if num_dupl > max_val:\n", " max_val = num_dupl\n", " index = idx\n", " text = description\n", " \n", " temp1 = descr_df.sort_values(by='num_occur', ascending=False)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrlennum_occurassoc_obj_idsnum_assoc_obj_ids
112Sichtkontrolle durchgeführt Auffälligkeiten fe...9598720[0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ...953
108Sichtkontrolle durchgeführt Auffälligkeiten fe...1001450[0, 1, 140, 301, 305, 313, 314, 576, 970, 1110...28
147Externe Prüfung wurde durchgeführt Beanstandun...1191082[191, 193, 195, 197, 200, 264, 287, 288, 289, ...413
128Reinigung durchgeführt Auffälligkeiten festges...90762[0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3...90
96Sichtkontrolle wie festgelegt durchgeführt Auf...110648[1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66...271
..................
2805X Achse Süd Führungswägen Kurze Version eingebaut491[21]1
2804Maschinenrahmen ausgerichtet und ausgebeult. M...901[144]1
2803Bügel und Stützräder getauscht301[315]1
2802Graf: TK wurde in Arbeitsauftrag 65487 gewandelt481[405]1
7378Neue Gasfeder eingebaut231[326]1
\n", "

7379 rows × 5 columns

\n", "
" ], "text/plain": [ " descr len num_occur \\\n", "112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 95 98720 \n", "108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 100 1450 \n", "147 Externe Prüfung wurde durchgeführt Beanstandun... 119 1082 \n", "128 Reinigung durchgeführt Auffälligkeiten festges... 90 762 \n", "96 Sichtkontrolle wie festgelegt durchgeführt Auf... 110 648 \n", "... ... ... ... \n", "2805 X Achse Süd Führungswägen Kurze Version eingebaut 49 1 \n", "2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 90 1 \n", "2803 Bügel und Stützräder getauscht 30 1 \n", "2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 48 1 \n", "7378 Neue Gasfeder eingebaut 23 1 \n", "\n", " assoc_obj_ids num_assoc_obj_ids \n", "112 [0, 1, 7, 17, 41, 42, 43, 44, 45, 46, 47, 51, ... 953 \n", "108 [0, 1, 140, 301, 305, 313, 314, 576, 970, 1110... 28 \n", "147 [191, 193, 195, 197, 200, 264, 287, 288, 289, ... 413 \n", "128 [0, 1, 7, 123, 136, 137, 138, 177, 298, 304, 3... 90 \n", "96 [1, 20, 21, 51, 52, 53, 54, 55, 56, 64, 65, 66... 271 \n", "... ... ... \n", "2805 [21] 1 \n", "2804 [144] 1 \n", "2803 [315] 1 \n", "2802 [405] 1 \n", "7378 [326] 1 \n", "\n", "[7379 rows x 5 columns]" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben:'" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1.iat[0,0]" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Sichtkontrolle durchgeführt Auffälligkeiten festgestellt vom Ausführenden bitte dazu schreiben: Nein'" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp1.iat[1,0]" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "# save/load dataframe\n", "FILE_PATH = f'{feature}_analyse_1.fth'\n", "if LOAD_CALC_FILES:\n", " temp1 = pd.read_feather(FILE_PATH)\n", " temp1 = temp1.set_index('index')\n", "else:\n", " save_df = temp1.reset_index()\n", " save_df.to_feather(FILE_PATH)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Gesamter Datensatz" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "# analysiere erste 10 Einträge\n", "descr = temp1[['descr', 'num_occur']]\n", "#descr = descr.iloc[50:200,:]" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "#descr.iat[0,0] = 'Das ist ein Test am 24.08.2023'" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7379" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(descr)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
descrnum_occur
112Sichtkontrolle durchgeführt Auffälligkeiten fe...98720
108Sichtkontrolle durchgeführt Auffälligkeiten fe...1450
147Externe Prüfung wurde durchgeführt Beanstandun...1082
128Reinigung durchgeführt Auffälligkeiten festges...762
96Sichtkontrolle wie festgelegt durchgeführt Auf...648
.........
2805X Achse Süd Führungswägen Kurze Version eingebaut1
2804Maschinenrahmen ausgerichtet und ausgebeult. M...1
2803Bügel und Stützräder getauscht1
2802Graf: TK wurde in Arbeitsauftrag 65487 gewandelt1
7378Neue Gasfeder eingebaut1
\n", "

7379 rows × 2 columns

\n", "
" ], "text/plain": [ " descr num_occur\n", "112 Sichtkontrolle durchgeführt Auffälligkeiten fe... 98720\n", "108 Sichtkontrolle durchgeführt Auffälligkeiten fe... 1450\n", "147 Externe Prüfung wurde durchgeführt Beanstandun... 1082\n", "128 Reinigung durchgeführt Auffälligkeiten festges... 762\n", "96 Sichtkontrolle wie festgelegt durchgeführt Auf... 648\n", "... ... ...\n", "2805 X Achse Süd Führungswägen Kurze Version eingebaut 1\n", "2804 Maschinenrahmen ausgerichtet und ausgebeult. M... 1\n", "2803 Bügel und Stützräder getauscht 1\n", "2802 Graf: TK wurde in Arbeitsauftrag 65487 gewandelt 1\n", "7378 Neue Gasfeder eingebaut 1\n", "\n", "[7379 rows x 2 columns]" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descr" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [], "source": [ "#LOAD_CALC_FILES = True\n", "#LOAD_CALC_FILES = False\n", "#IS_TEST = True\n", "IS_TEST = False" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO:base:Number of entries processed: 1, Percent completed: 0.01\n", "INFO:base:Number of entries processed: 501, Percent completed: 6.79\n", "INFO:base:Number of entries processed: 1001, Percent completed: 13.57\n", "INFO:base:Number of entries processed: 1501, Percent completed: 20.34\n", "INFO:base:Number of entries processed: 2001, Percent completed: 27.12\n", "INFO:base:Number of entries processed: 2501, Percent completed: 33.89\n", "INFO:base:Number of entries processed: 3001, Percent completed: 40.67\n", "INFO:base:Number of entries processed: 3501, Percent completed: 47.45\n", "INFO:base:Number of entries processed: 4001, Percent completed: 54.22\n", "INFO:base:Number of entries processed: 4501, Percent completed: 61.00\n", "INFO:base:Number of entries processed: 5001, Percent completed: 67.77\n", "INFO:base:Number of entries processed: 5501, Percent completed: 74.55\n", "INFO:base:Number of entries processed: 6001, Percent completed: 81.33\n", "INFO:base:Number of entries processed: 6501, Percent completed: 88.10\n", "INFO:base:Number of entries processed: 7001, Percent completed: 94.88\n" ] } ], "source": [ "# adjacency matrix\n", "connections = dict()\n", "unique_tokens = set()\n", "UPDATE_STATUS = 500\n", "length_data = len(descr)\n", "spell_check_candidates = set()\n", "spell_checker = SpellChecker(language='de', distance=1)\n", "\n", "if not LOAD_CALC_FILES or IS_TEST:\n", " for count, description in enumerate(descr.iterrows()):\n", " \n", " text = description[1]['descr']\n", " weight = description[1]['num_occur']\n", " \n", " doc = nlp(text)\n", " \n", " obtain_descendant_info(\n", " doc=doc,\n", " weight=weight,\n", " POS_of_interest=POS_of_interest,\n", " TAG_of_interest=TAG_of_interest,\n", " connections=connections,\n", " unique_tokens=unique_tokens,\n", " spell_check_candidates=spell_check_candidates,\n", " spell_check_whitelist=spell_check_whitelist,\n", " spell_checker=spell_checker,\n", " corrections=corrections,\n", " )\n", " \n", " if count % UPDATE_STATUS == 0:\n", " logger.info(f'Number of entries processed: {count+1}, Percent completed: {((count+1) / length_data) * 100:.2f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "ADJ_DF_PATH = f'./Graphanalyse/adj_mat_df_{feature}.fth'\n", "if not IS_TEST:\n", " if LOAD_CALC_FILES:\n", " adj_mat_undir = pd.read_feather(ADJ_DF_PATH)\n", " adj_mat_undir = adj_mat_undir.set_index('index')\n", " # additional information\n", " connections = load_pickle('connections.pkl')\n", " unique_tokens = load_pickle('unique_tokens.pkl')\n", " else:\n", " adj_mat = obtain_adj_matrix(unique_tokens=unique_tokens, connections=connections)\n", " adj_mat_undir = make_undir_adj_matrix(adj_mat=adj_mat)\n", " save_df = adj_mat_undir.reset_index()\n", " save_df.to_feather(ADJ_DF_PATH)\n", " # additional information\n", " save_pickle(obj=connections, path='connections.pkl')\n", " save_pickle(obj=unique_tokens, path='unique_tokens.pkl')" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funktionsfähigZwischenbehälterÖlfilterRechterKontaktproblemGeschweisstvorbereitenGelenkbolzenSilikonfassAusbau...KomanlernennahBegutachtungBetriebszeitpalettenaugetretenAntriebszahnradGewindereparatursetHeizventil
-20C0000000000...0000000000
-Befestihgung0000000000...0000000000
-Einlaufwalze0000000000...0000000000
-Entlüftungssicherung0000000000...0000000000
-Faltbalken0000000000...0000000000
..................................................................
überzogenn0000000000...0000000000
überzoggen0000000000...0000000000
übrtprüfen0000000000...0000000000
ünerziehen0000000000...0000000000
üperprüfen0000000000...0000000000
\n", "

6946 rows × 6946 columns

\n", "
" ], "text/plain": [ " funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n", "-20C 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Einlaufwalze 0 0 0 0 \n", "-Entlüftungssicherung 0 0 0 0 \n", "-Faltbalken 0 0 0 0 \n", "... ... ... ... ... \n", "überzogenn 0 0 0 0 \n", "überzoggen 0 0 0 0 \n", "übrtprüfen 0 0 0 0 \n", "ünerziehen 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n", "-20C 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Einlaufwalze 0 0 0 0 \n", "-Entlüftungssicherung 0 0 0 0 \n", "-Faltbalken 0 0 0 0 \n", "... ... ... ... ... \n", "überzogenn 0 0 0 0 \n", "überzoggen 0 0 0 0 \n", "übrtprüfen 0 0 0 0 \n", "ünerziehen 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " Silikonfass Ausbau ... Kom anlernen nah \\\n", "-20C 0 0 ... 0 0 0 \n", "-Befestihgung 0 0 ... 0 0 0 \n", "-Einlaufwalze 0 0 ... 0 0 0 \n", "-Entlüftungssicherung 0 0 ... 0 0 0 \n", "-Faltbalken 0 0 ... 0 0 0 \n", "... ... ... ... ... ... ... \n", "überzogenn 0 0 ... 0 0 0 \n", "überzoggen 0 0 ... 0 0 0 \n", "übrtprüfen 0 0 ... 0 0 0 \n", "ünerziehen 0 0 ... 0 0 0 \n", "üperprüfen 0 0 ... 0 0 0 \n", "\n", " Begutachtung Betriebszeit paletten augetreten \\\n", "-20C 0 0 0 0 \n", "-Befestihgung 0 0 0 0 \n", "-Einlaufwalze 0 0 0 0 \n", "-Entlüftungssicherung 0 0 0 0 \n", "-Faltbalken 0 0 0 0 \n", "... ... ... ... ... \n", "überzogenn 0 0 0 0 \n", "überzoggen 0 0 0 0 \n", "übrtprüfen 0 0 0 0 \n", "ünerziehen 0 0 0 0 \n", "üperprüfen 0 0 0 0 \n", "\n", " Antriebszahnrad Gewindereparaturset Heizventil \n", "-20C 0 0 0 \n", "-Befestihgung 0 0 0 \n", "-Einlaufwalze 0 0 0 \n", "-Entlüftungssicherung 0 0 0 \n", "-Faltbalken 0 0 0 \n", "... ... ... ... \n", "überzogenn 0 0 0 \n", "überzoggen 0 0 0 \n", "übrtprüfen 0 0 0 \n", "ünerziehen 0 0 0 \n", "üperprüfen 0 0 0 \n", "\n", "[6946 rows x 6946 columns]" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adj_mat_undir.sort_index()" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "24171" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "103601" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.max(arr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Threshold" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "WEIGHT_THRESHOLD = 30" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "arr = adj_mat_undir.to_numpy()" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "arr = np.where(arr < WEIGHT_THRESHOLD, 0, arr)" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "138" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.count_nonzero(arr)" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [], "source": [ "thresh_adj_mat = adj_mat_undir.copy()\n", "thresh_adj_mat.loc[:] = arr" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funktionsfähigZwischenbehälterÖlfilterRechterKontaktproblemGeschweisstvorbereitenGelenkbolzenSilikonfassAusbau...KomanlernennahBegutachtungBetriebszeitpalettenaugetretenAntriebszahnradGewindereparatursetHeizventil
funktionsfähig0000000000...0000000000
Zwischenbehälter0000000000...0000000000
Ölfilter0000000000...0000000000
Rechter0000000000...0000000000
Kontaktproblem0000000000...0000000000
..................................................................
paletten0000000000...0000000000
augetreten0000000000...0000000000
Antriebszahnrad0000000000...0000000000
Gewindereparaturset0000000000...0000000000
Heizventil0000000000...0000000000
\n", "

6946 rows × 6946 columns

\n", "
" ], "text/plain": [ " funktionsfähig Zwischenbehälter Ölfilter Rechter \\\n", "funktionsfähig 0 0 0 0 \n", "Zwischenbehälter 0 0 0 0 \n", "Ölfilter 0 0 0 0 \n", "Rechter 0 0 0 0 \n", "Kontaktproblem 0 0 0 0 \n", "... ... ... ... ... \n", "paletten 0 0 0 0 \n", "augetreten 0 0 0 0 \n", "Antriebszahnrad 0 0 0 0 \n", "Gewindereparaturset 0 0 0 0 \n", "Heizventil 0 0 0 0 \n", "\n", " Kontaktproblem Geschweisst vorbereiten Gelenkbolzen \\\n", "funktionsfähig 0 0 0 0 \n", "Zwischenbehälter 0 0 0 0 \n", "Ölfilter 0 0 0 0 \n", "Rechter 0 0 0 0 \n", "Kontaktproblem 0 0 0 0 \n", "... ... ... ... ... \n", "paletten 0 0 0 0 \n", "augetreten 0 0 0 0 \n", "Antriebszahnrad 0 0 0 0 \n", "Gewindereparaturset 0 0 0 0 \n", "Heizventil 0 0 0 0 \n", "\n", " Silikonfass Ausbau ... Kom anlernen nah \\\n", "funktionsfähig 0 0 ... 0 0 0 \n", "Zwischenbehälter 0 0 ... 0 0 0 \n", "Ölfilter 0 0 ... 0 0 0 \n", "Rechter 0 0 ... 0 0 0 \n", "Kontaktproblem 0 0 ... 0 0 0 \n", "... ... ... ... ... ... ... \n", "paletten 0 0 ... 0 0 0 \n", "augetreten 0 0 ... 0 0 0 \n", "Antriebszahnrad 0 0 ... 0 0 0 \n", "Gewindereparaturset 0 0 ... 0 0 0 \n", "Heizventil 0 0 ... 0 0 0 \n", "\n", " Begutachtung Betriebszeit paletten augetreten \\\n", "funktionsfähig 0 0 0 0 \n", "Zwischenbehälter 0 0 0 0 \n", "Ölfilter 0 0 0 0 \n", "Rechter 0 0 0 0 \n", "Kontaktproblem 0 0 0 0 \n", "... ... ... ... ... \n", "paletten 0 0 0 0 \n", "augetreten 0 0 0 0 \n", "Antriebszahnrad 0 0 0 0 \n", "Gewindereparaturset 0 0 0 0 \n", "Heizventil 0 0 0 0 \n", "\n", " Antriebszahnrad Gewindereparaturset Heizventil \n", "funktionsfähig 0 0 0 \n", "Zwischenbehälter 0 0 0 \n", "Ölfilter 0 0 0 \n", "Rechter 0 0 0 \n", "Kontaktproblem 0 0 0 \n", "... ... ... ... \n", "paletten 0 0 0 \n", "augetreten 0 0 0 \n", "Antriebszahnrad 0 0 0 \n", "Gewindereparaturset 0 0 0 \n", "Heizventil 0 0 0 \n", "\n", "[6946 rows x 6946 columns]" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "thresh_adj_mat" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [], "source": [ "ADJ_MAT_PATH_CSV = f'./Graphanalyse/adj_mat_thresh_{feature}_{WEIGHT_THRESHOLD}.csv'\n", "thresh_adj_mat.to_csv(path_or_buf=ADJ_MAT_PATH_CSV, encoding='cp1252', sep=';')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# **Zusatz**\n", "\n", "#### **Analysiere beispielhaft Eintrag mit meisten Duplikaten**" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Einträge mit gewählter Beschreibung: 47689\n" ] } ], "source": [ "crit = uni_descr[171]\n", "filt = wo_duplicates['VorgangsBeschreibung'] == crit\n", "temp = wo_duplicates[filt]\n", "print(f\"Anzahl Einträge mit gewählter Beschreibung: {len(temp)}\")" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
288155717187246, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
289152507177204 S SI , Webmaschine, DL 280 EMS Breite 2203Luft-Webmaschine1Wartung2022-04-0950Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-09Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-092022-02-17
318255972249203 C S SI, Webmaschine, DL 280 EMS Breite 2203Luft-Webmaschine1Wartung2022-07-3050Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-07-30Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-07-302022-04-28
319255977249203 C S SI, Webmaschine, DL 280 EMS Breite 2203Luft-Webmaschine1Wartung2022-08-0450Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-08-04Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-08-042022-04-28
340267942187246, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-08-0750Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-08-07Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-08-072022-08-05
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText \\\n", "288 155717 187 246, Webmaschine Jacquard, \n", "289 152507 177 204 S SI , Webmaschine, DL 280 EMS Breite 220 \n", "318 255972 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n", "319 255977 249 203 C S SI, Webmaschine, DL 280 EMS Breite 220 \n", "340 267942 187 246, Webmaschine Jacquard, \n", "\n", " ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n", "288 6 Jacquard-Webmaschine 1 Wartung \n", "289 3 Luft-Webmaschine 1 Wartung \n", "318 3 Luft-Webmaschine 1 Wartung \n", "319 3 Luft-Webmaschine 1 Wartung \n", "340 6 Jacquard-Webmaschine 1 Wartung \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "288 2022-04-01 5 0 \n", "289 2022-04-09 5 0 \n", "318 2022-07-30 5 0 \n", "319 2022-08-04 5 0 \n", "340 2022-08-07 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "289 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "318 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "319 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "340 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "289 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-09 \n", "318 Tägliche Interne Wartungstätigkeiten Weberei 2022-07-30 \n", "319 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-04 \n", "340 Tägliche Interne Wartungstätigkeiten Weberei 2022-08-07 \n", "\n", " ErledigungsArtText \\\n", "288 Intern UTT - Sichtkontrolle \n", "289 Intern UTT - Sichtkontrolle \n", "318 Intern UTT - Sichtkontrolle \n", "319 Intern UTT - Sichtkontrolle \n", "340 Intern UTT - Sichtkontrolle \n", "\n", " ErledigungsBeschreibung MPMelderArbeitsplatz \\\n", "288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "289 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "318 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "319 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "340 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "\n", " MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "288 NaN 2022-04-01 2022-02-17 \n", "289 NaN 2022-04-09 2022-02-17 \n", "318 NaN 2022-07-30 2022-04-28 \n", "319 NaN 2022-08-04 2022-04-28 \n", "340 NaN 2022-08-07 2022-08-05 " ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp.head()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "# schaue welche Merkmale abweichend sind\n", "analyse_columns = ['ObjektID', 'VorgangsTypID', 'VorgangsTypName']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ObjektID" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 187, 177, 249, 2654, 1792, 272, 271, 270, 269, 268, 186,\n", " 178, 179, 2317, 2318, 2473, 2559, 1244, 240, 241, 180, 220,\n", " 221, 222, 223, 224, 961, 962, 2166, 3212, 267, 266, 181,\n", " 182, 213, 214, 174, 175, 176, 156, 157, 158, 247, 248,\n", " 183, 265, 278, 1793, 1794, 218, 217, 219, 215, 216, 2319,\n", " 2320, 228, 184, 152, 153, 2165, 154, 155, 159, 167, 168,\n", " 169, 2313, 2314, 2315, 2316, 212, 211, 160, 161, 162, 164,\n", " 165, 166, 264, 273, 274, 277, 276, 275, 279, 280, 281,\n", " 282, 283, 242, 243, 244, 245, 246, 225, 227, 229, 170,\n", " 171, 172, 173, 230, 231, 3213, 3211, 3214], dtype=int64)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp['ObjektID'].unique()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "filt = temp['ObjektID'] == 2318\n", "temp_fil1 = temp[filt]" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
8782697432318A067, Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine1Wartung2022-10-3150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-10-31Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-10-312022-08-05
60991524902318A067, Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine1Wartung2022-03-2450Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-03-24Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-03-242022-02-17
139051524762318A067, Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine1Wartung2022-03-1050Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-03-10Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-03-102022-02-17
140192483012318A067, Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine1Wartung2022-04-2850Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-28Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-282022-04-14
142112549142318A067, Webmaschine, DL 280 EMS Breite 2803Luft-Webmaschine1Wartung2022-05-1950Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-05-19Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-05-192022-04-28
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText \\\n", "878 269743 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n", "6099 152490 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n", "13905 152476 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n", "14019 248301 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n", "14211 254914 2318 A067, Webmaschine, DL 280 EMS Breite 280 \n", "\n", " ObjektArtID ObjektArtText VorgangsTypID VorgangsTypName \\\n", "878 3 Luft-Webmaschine 1 Wartung \n", "6099 3 Luft-Webmaschine 1 Wartung \n", "13905 3 Luft-Webmaschine 1 Wartung \n", "14019 3 Luft-Webmaschine 1 Wartung \n", "14211 3 Luft-Webmaschine 1 Wartung \n", "\n", " VorgangsDatum VorgangsStatusId VorgangsPrioritaet \\\n", "878 2022-10-31 5 0 \n", "6099 2022-03-24 5 0 \n", "13905 2022-03-10 5 0 \n", "14019 2022-04-28 5 0 \n", "14211 2022-05-19 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "878 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "6099 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "13905 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "14019 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "14211 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "878 Tägliche Interne Wartungstätigkeiten Weberei 2022-10-31 \n", "6099 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-24 \n", "13905 Tägliche Interne Wartungstätigkeiten Weberei 2022-03-10 \n", "14019 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-28 \n", "14211 Tägliche Interne Wartungstätigkeiten Weberei 2022-05-19 \n", "\n", " ErledigungsArtText \\\n", "878 Intern UTT - Sichtkontrolle \n", "6099 Intern UTT - Sichtkontrolle \n", "13905 Intern UTT - Sichtkontrolle \n", "14019 Intern UTT - Sichtkontrolle \n", "14211 Intern UTT - Sichtkontrolle \n", "\n", " ErledigungsBeschreibung MPMelderArbeitsplatz \\\n", "878 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "6099 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "13905 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "14019 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "14211 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "\n", " MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "878 NaN 2022-10-31 2022-08-05 \n", "6099 NaN 2022-03-24 2022-02-17 \n", "13905 NaN 2022-03-10 2022-02-17 \n", "14019 NaN 2022-04-28 2022-04-14 \n", "14211 NaN 2022-05-19 2022-04-28 " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_fil1.head()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "['2022-10-31 00:00:00', '2022-03-24 00:00:00', '2022-03-10 00:00:00',\n", " '2022-04-28 00:00:00', '2022-05-19 00:00:00', '2022-04-09 00:00:00',\n", " '2022-04-21 00:00:00', '2022-06-11 00:00:00', '2022-05-12 00:00:00',\n", " '2022-04-23 00:00:00',\n", " ...\n", " '2022-10-28 00:00:00', '2022-07-06 00:00:00', '2023-06-14 00:00:00',\n", " '2022-10-29 00:00:00', '2022-07-07 00:00:00', '2023-06-15 00:00:00',\n", " '2022-05-05 00:00:00', '2022-10-30 00:00:00', '2022-07-08 00:00:00',\n", " '2022-10-19 00:00:00']\n", "Length: 462, dtype: datetime64[ns]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_fil1['VorgangsDatum'].unique()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "462" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(temp_fil1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "VorgangsID" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl einzigartiger VorgangsID 1855 mit Anteil am Gesamtdatensatz 3.89 %\n" ] } ], "source": [ "uni_VorgangsID = temp['VorgangsID'].unique()\n", "num_uni_VorgangsID = len(uni_VorgangsID)\n", "print(f'Anzahl einzigartiger VorgangsID {num_uni_VorgangsID} mit Anteil am Gesamtdatensatz {num_uni_VorgangsID / len(temp) * 100:.2f} %')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "155717" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "uni_VorgangsID[0]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "filt = temp['VorgangsID'] == uni_VorgangsID[0]\n", "temp_fil1 = temp[filt]" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
288155717187246, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
27181557171792A057, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
2719155717186245 J, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
27201557172473A056, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
55041557172559A070, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
5505155717961A054, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
5506155717962A055, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
55071557172166A061, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
55081557171793A058, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
55091557171794A059, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
82941557172165A060, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...NaNTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...NaNNaN2022-04-012022-02-17
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText ObjektArtID \\\n", "288 155717 187 246, Webmaschine Jacquard, 6 \n", "2718 155717 1792 A057, Webmaschine Jacquard, 6 \n", "2719 155717 186 245 J, Webmaschine Jacquard, 6 \n", "2720 155717 2473 A056, Webmaschine Jacquard, 6 \n", "5504 155717 2559 A070, Webmaschine Jacquard, 6 \n", "5505 155717 961 A054, Webmaschine Jacquard, 6 \n", "5506 155717 962 A055, Webmaschine Jacquard, 6 \n", "5507 155717 2166 A061, Webmaschine Jacquard, 6 \n", "5508 155717 1793 A058, Webmaschine Jacquard, 6 \n", "5509 155717 1794 A059, Webmaschine Jacquard, 6 \n", "8294 155717 2165 A060, Webmaschine Jacquard, 6 \n", "\n", " ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n", "288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "\n", " VorgangsStatusId VorgangsPrioritaet \\\n", "288 5 0 \n", "2718 5 0 \n", "2719 5 0 \n", "2720 5 0 \n", "5504 5 0 \n", "5505 5 0 \n", "5506 5 0 \n", "5507 5 0 \n", "5508 5 0 \n", "5509 5 0 \n", "8294 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "288 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... NaN \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "\n", " ErledigungsArtText \\\n", "288 Intern UTT - Sichtkontrolle \n", "2718 Intern UTT - Sichtkontrolle \n", "2719 Intern UTT - Sichtkontrolle \n", "2720 Intern UTT - Sichtkontrolle \n", "5504 Intern UTT - Sichtkontrolle \n", "5505 Intern UTT - Sichtkontrolle \n", "5506 Intern UTT - Sichtkontrolle \n", "5507 Intern UTT - Sichtkontrolle \n", "5508 Intern UTT - Sichtkontrolle \n", "5509 Intern UTT - Sichtkontrolle \n", "8294 Intern UTT - Sichtkontrolle \n", "\n", " ErledigungsBeschreibung MPMelderArbeitsplatz \\\n", "288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... NaN \n", "\n", " MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "288 NaN 2022-04-01 2022-02-17 \n", "2718 NaN 2022-04-01 2022-02-17 \n", "2719 NaN 2022-04-01 2022-02-17 \n", "2720 NaN 2022-04-01 2022-02-17 \n", "5504 NaN 2022-04-01 2022-02-17 \n", "5505 NaN 2022-04-01 2022-02-17 \n", "5506 NaN 2022-04-01 2022-02-17 \n", "5507 NaN 2022-04-01 2022-02-17 \n", "5508 NaN 2022-04-01 2022-02-17 \n", "5509 NaN 2022-04-01 2022-02-17 \n", "8294 NaN 2022-04-01 2022-02-17 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_fil1" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Anzahl Einträge mit gewählter VorgangsID: 11\n", "Anzahl einzigartiger ObjektIDs darunter: 11\n" ] } ], "source": [ "temp_fil2 = temp_fil1.fillna(value=False)\n", "print(f'Anzahl Einträge mit gewählter VorgangsID: {len(temp_fil2)}')\n", "uni_obj_id = len(temp_fil2['ObjektID'].unique())\n", "print(f'Anzahl einzigartiger ObjektIDs darunter: {uni_obj_id}')" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 187, 1792, 186, 2473, 2559, 961, 962, 2166, 1793, 1794, 2165],\n", " dtype=int64)" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_fil2['ObjektID'].unique()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsIDObjektIDHObjektTextObjektArtIDObjektArtTextVorgangsTypIDVorgangsTypNameVorgangsDatumVorgangsStatusIdVorgangsPrioritaetVorgangsBeschreibungVorgangsOrtVorgangsArtTextErledigungsDatumErledigungsArtTextErledigungsBeschreibungMPMelderArbeitsplatzMPAbteilungBezeichnungArbeitsbeginnErstellungsDatum
288155717187246, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
27181557171792A057, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
2719155717186245 J, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
27201557172473A056, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
55041557172559A070, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
5505155717961A054, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
5506155717962A055, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
55071557172166A061, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
55081557171793A058, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
55091557171794A059, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
82941557172165A060, Webmaschine Jacquard,6Jacquard-Webmaschine1Wartung2022-04-0150Tägliche Wartungstätigkeiten nach Vorgabe des ...FalseTägliche Interne Wartungstätigkeiten Weberei2022-04-01Intern UTT - SichtkontrolleSichtkontrolle durchgeführt\\n\\nAuffälligkeiten...FalseFalse2022-04-012022-02-17
\n", "
" ], "text/plain": [ " VorgangsID ObjektID HObjektText ObjektArtID \\\n", "288 155717 187 246, Webmaschine Jacquard, 6 \n", "2718 155717 1792 A057, Webmaschine Jacquard, 6 \n", "2719 155717 186 245 J, Webmaschine Jacquard, 6 \n", "2720 155717 2473 A056, Webmaschine Jacquard, 6 \n", "5504 155717 2559 A070, Webmaschine Jacquard, 6 \n", "5505 155717 961 A054, Webmaschine Jacquard, 6 \n", "5506 155717 962 A055, Webmaschine Jacquard, 6 \n", "5507 155717 2166 A061, Webmaschine Jacquard, 6 \n", "5508 155717 1793 A058, Webmaschine Jacquard, 6 \n", "5509 155717 1794 A059, Webmaschine Jacquard, 6 \n", "8294 155717 2165 A060, Webmaschine Jacquard, 6 \n", "\n", " ObjektArtText VorgangsTypID VorgangsTypName VorgangsDatum \\\n", "288 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2718 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2719 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "2720 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5504 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5505 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5506 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5507 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5508 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "5509 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "8294 Jacquard-Webmaschine 1 Wartung 2022-04-01 \n", "\n", " VorgangsStatusId VorgangsPrioritaet \\\n", "288 5 0 \n", "2718 5 0 \n", "2719 5 0 \n", "2720 5 0 \n", "5504 5 0 \n", "5505 5 0 \n", "5506 5 0 \n", "5507 5 0 \n", "5508 5 0 \n", "5509 5 0 \n", "8294 5 0 \n", "\n", " VorgangsBeschreibung VorgangsOrt \\\n", "288 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "2718 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "2719 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "2720 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5504 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5505 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5506 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5507 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5508 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "5509 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "8294 Tägliche Wartungstätigkeiten nach Vorgabe des ... False \n", "\n", " VorgangsArtText ErledigungsDatum \\\n", "288 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2718 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2719 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "2720 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5504 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5505 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5506 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5507 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5508 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "5509 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "8294 Tägliche Interne Wartungstätigkeiten Weberei 2022-04-01 \n", "\n", " ErledigungsArtText \\\n", "288 Intern UTT - Sichtkontrolle \n", "2718 Intern UTT - Sichtkontrolle \n", "2719 Intern UTT - Sichtkontrolle \n", "2720 Intern UTT - Sichtkontrolle \n", "5504 Intern UTT - Sichtkontrolle \n", "5505 Intern UTT - Sichtkontrolle \n", "5506 Intern UTT - Sichtkontrolle \n", "5507 Intern UTT - Sichtkontrolle \n", "5508 Intern UTT - Sichtkontrolle \n", "5509 Intern UTT - Sichtkontrolle \n", "8294 Intern UTT - Sichtkontrolle \n", "\n", " ErledigungsBeschreibung MPMelderArbeitsplatz \\\n", "288 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "2718 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "2719 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "2720 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5504 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5505 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5506 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5507 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5508 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "5509 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "8294 Sichtkontrolle durchgeführt\\n\\nAuffälligkeiten... False \n", "\n", " MPAbteilungBezeichnung Arbeitsbeginn ErstellungsDatum \n", "288 False 2022-04-01 2022-02-17 \n", "2718 False 2022-04-01 2022-02-17 \n", "2719 False 2022-04-01 2022-02-17 \n", "2720 False 2022-04-01 2022-02-17 \n", "5504 False 2022-04-01 2022-02-17 \n", "5505 False 2022-04-01 2022-02-17 \n", "5506 False 2022-04-01 2022-02-17 \n", "5507 False 2022-04-01 2022-02-17 \n", "5508 False 2022-04-01 2022-02-17 \n", "5509 False 2022-04-01 2022-02-17 \n", "8294 False 2022-04-01 2022-02-17 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "temp_fil2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Frage: Können einem Vorgang mehrere ObjektIDs zugeordnet werden? Wenn ja, warum dann unterschiedliche Erledigungsdaten?*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Länge der Beschreibungen**" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "descriptions = descriptions.to_frame()\n", "descriptions['length_description'] = descriptions.applymap(func=lambda x: len(x))\n", "descriptions = descriptions.sort_values(by=['length_description'], ascending=False)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 124008.000000\n", "mean 70.351751\n", "std 53.080901\n", "min 1.000000\n", "25% 66.000000\n", "50% 66.000000\n", "75% 67.000000\n", "max 3137.000000\n", "Name: length_description, dtype: float64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# stats\n", "len_descr = descriptions['length_description']\n", "len_descr.describe()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsBeschreibunglength_description
8704Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...3137
7826Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...3137
49779Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
124118Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
14853Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
\n", "
" ], "text/plain": [ " VorgangsBeschreibung length_description\n", "8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n", "7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n", "49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n", "124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n", "14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descriptions.head()" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VorgangsBeschreibunglength_description
8704Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...3137
7826Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /...3137
49779Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
124118Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
14853Laut Wartungsvertrag (Hr.Radtke) Bestellnummer...2311
.........
134501
134511
299791
134521
21214\\n1
\n", "

124008 rows × 2 columns

\n", "
" ], "text/plain": [ " VorgangsBeschreibung length_description\n", "8704 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n", "7826 Vorgaben aus Held Wartungsplan\\n\\nLC-X-Achse /... 3137\n", "49779 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n", "124118 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n", "14853 Laut Wartungsvertrag (Hr.Radtke) Bestellnummer... 2311\n", "... ... ...\n", "13450 1\n", "13451 1\n", "29979 1\n", "13452 1\n", "21214 \\n 1\n", "\n", "[124008 rows x 2 columns]" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "descriptions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 4 }