f6dZdZddlZddlmZddlmZddlZddlmZm Z ddl m Z ddl Z ddl Z ddlZddlZddlZddlZddlZddlZdZdd ZGd d eZd Zd ZdZddZddZddZddZddZedk(r!eej>jAyy)z=Diagnostic functions, mainly for use when doing tech support.MITN)BytesIO) HTMLParser) BeautifulSoup __version__)builder_registryc tdtztdtjzgd}|D]F}tj D]}||j vs'|j|td|zHd|vrM|jd ddl m }td d jtt|jzd |vr dd l}td|jzt#|dr|j%}|D]V}td|zd} t'||} d}|r'td|zt j/tdXy #t$r}td Yd }~d }~wwxYw#t$r}tdYd }~d }~wwxYw#t($r,}td|zt+j,Yd }~d }~wwxYw)zDiagnostic suite for isolating common problems. :param data: A string containing markup that needs to be explained. :return: None; diagnostics are printed to standard output. z'Diagnostic running on Beautiful Soup %szPython version %s) html.parserhtml5liblxmlz;I noticed that %s is not installed. Installing it may help.r zlxml-xmlretreezFound lxml version %s.z.lxml is not installed or couldn't be imported.Nr zFound html5lib version %sz2html5lib is not installed or couldn't be imported.readz#Trying to parse your markup with %sF)featuresT%s could not parse the markup.z#Here's what %s did with the markup:zP--------------------------------------------------------------------------------)printrsysversionrbuildersrremoveappendr rjoinmapstr LXML_VERSION ImportErrorr hasattrrr Exception traceback print_excprettify) data basic_parsersnamebuilderrer parsersuccesssoups U/var/lib/jenkins/workspace/mettalog/venv/lib/python3.12/site-packages/bs4/diagnose.pydiagnoser,s  4{ BD  ,.7M'00Gw'''1   & M  Z( B " *SXXc#e>P>P6Q-RR T ]" F  .1E1EE G tVyy{ 4v=? " 7DG  86A C 4==? $ x ! B @ B B B F D F F F " 3f< >    ! ! "sH;E(F "F*( F1 FF F' F""F'* G3"GGc ddlm}|jdd}t|tr|j d}t |}|j|f||d|D]-\}}t|d|jdd|j/y ) aPrint out the lxml events that occur during parsing. This lets you see how lxml parses a document when no Beautiful Soup code is running. You can use this to determine whether an lxml-specific problem is in Beautiful Soup's lxml tree builders or in lxml itself. :param data: Some markup. :param html: If True, markup will be parsed with lxml's HTML parser. if False, lxml's XML parser will be used. rr recoverTutf8)htmlr.z, z>4N) r rpop isinstancerencoder iterparsertagtext)r#r0kwargsrr.readereventelements r+ lxml_tracer;NsjjD)G${{6" T]F)%//7.4w w{{GLLACcLeZdZdZdZdZdZdZdZdZ dZ d Z d Z d Z y ) AnnouncingParserzSubclass of HTMLParser that announces parse events, without doing anything else. You can use this to get a picture of how html.parser sees a given document. The easiest way to do this is to call `htmlparser_trace`. ct|y)N)r)selfss r+_pzAnnouncingParser._pls  ar<c,|jd|zy)Nz%s STARTrB)r@r%attrss r+handle_starttagz AnnouncingParser.handle_starttagos  T!"r<c,|jd|zy)Nz%s ENDrDr@r%s r+ handle_endtagzAnnouncingParser.handle_endtagrs 4 r<c,|jd|zy)Nz%s DATArDr@r#s r+ handle_datazAnnouncingParser.handle_datau  D !r<c,|jd|zy)Nz %s CHARREFrDrHs r+handle_charrefzAnnouncingParser.handle_charrefx  t#$r<c,|jd|zy)Nz %s ENTITYREFrDrHs r+handle_entityrefz!AnnouncingParser.handle_entityref{s %&r<c,|jd|zy)Nz %s COMMENTrDrKs r+handle_commentzAnnouncingParser.handle_comment~rPr<c,|jd|zy)Nz%s DECLrDrKs r+ handle_declzAnnouncingParser.handle_declrMr<c,|jd|zy)Nz%s UNKNOWN-DECLrDrKs r+ unknown_declzAnnouncingParser.unknown_decls !D()r<c,|jd|zy)Nz%s PIrDrKs r+ handle_pizAnnouncingParser.handle_pis $r<N)__name__ __module__ __qualname____doc__rBrFrIrLrOrRrTrVrXrZr<r+r>r>ds9#!"%'%"* r<r>c:t}|j|y)zPrint out the HTMLParser events that occur during parsing. This lets you see how HTMLParser parses a document when no Beautiful Soup code is running. :param data: Some markup. N)r>feed)r#r(s r+htmlparser_tracerbs F KKr<aeioubcdfghjklmnpqrstvwxyzcd}t|D]/}|dzdk(rt}nt}|tj|z }1|S)z#Generate a random word-like string.r)range _consonants_vowelsrandomchoice)lengthrAits r+rwordrpsD A 6] q5A:AA V]]1   Hr<cDdjdt|DS)z'Generate a random sentence-like string. c3ZK|]#}ttjdd%yw) N)rprkrandint).0rns r+ zrsentence..s F1E&..1-.Fs)+)rrh)rms r+ rsentencerys 88Ff F FFr<c gd}g}t|D]}tjdd}|dk(r*tj|}|j d|zH|dk(r/|j t tjdd||dk(stj|}|j d|zd d j |zd zS) z+Randomly generate an invalid HTML document.)pdivspanrnbscripttablerz<%s>rtrgzz z)rhrkrvrlrryr) num_elements tag_nameselementsrnrltag_names r+rdocrsAIH < !$ Q;}}Y/H OOFX- . q[ OOIfnnQq&9: ; q[}}Y/H OOGh. /! dii) )I 55r<ctdtzt|}tdt|zdddgddfD]Q}d} t j}t ||}t j}d}|s?td |z fzSd d l m }t j}|j|t j}td||z zd d l } | j}t j}|j|t j}td||z zy #t $r,}td |ztjYd }~d }~wwxYw)z.Very basic head-to-head performance benchmark.z1Comparative parser benchmark on Beautiful Soup %sz3Generated a large invalid HTML document (%d bytes).r r0r r FTrNz"BS4+%s parsed the markup in %.2fs.rr z$Raw lxml parsed the markup in %.2fs.z(Raw html5lib parsed the markup in %.2fs.)rrrlentimerrr r!r rHTMLr rparse) rr#r(r)ar*r~r'rr s r+benchmark_parsersrs> > LN  D @3t9 LNFF+ZG " A v.D AG  761Q3-G IH A JJt A 1QqS 9;  "F A LL A 51 =?# " 3f< >    ! ! "s6E E5 "E00E5ctj}|j}t|}t t ||}t jd|||tj|}|jd|jddy)z7Use Python's profiler on a randomly generated document.)bs4r#r(zbs4.BeautifulSoup(data, parser) cumulativez _html5lib|bs42N) tempfileNamedTemporaryFiler%rdictrcProfilerunctxpstatsStats sort_stats print_stats)rr( filehandlefilenamer#varsstatss r+profilersp,,.JH  D Cd6 2D OO5dHM LL "E \" or*r<__main__)T))rt)i)順)rr )!r^ __license__rior html.parserrrrr bs4.builderrosrrkrrr rr,r;r>rbrjrirpryrrrr[stdinrr_r<r+rsC " *(   6pD,$ z$ L  %   G6$@@ + z SYY^^ r<