import "./blogStyle.css"
import "./progRep.css"
import YoutubeEmbed from "./youtubeEmbed"

export const ProgressReport = () => {
    return(
        <div className="post">
            <div className="header">
                <h1>
                    Final Project Final Submission
                </h1>
                <h3>GWU CSCI 4527</h3>
                <h3>5/6/24 - Owen Wolff</h3>
            </div>

            <div class="intro">
                <br/>
                <h1 className="centerText ">
                    It Works!*
                </h1>

                <p className="centerText italic">
                    *not well
                </p>
                <br/>
            </div>
            <div class="wideCenter">
                <YoutubeEmbed embedId="1qGw1Q6pQKA" />
            </div>
            <br/>
            <p className="textBody large padout">
                For the second phase of this project I focused on a couple things. Most apparent from the demo is additional functionality from 
                the detection of gestures. Since the last report, I switched from using mac accessiblity hacks to pyautogui for executing 
                system interaction. This is a much simpler solution and offers a little more responsiveness between processing frames. Less apparent from 
                the demo is how much time I spent trying to optimize processing speed.
            </p>
            <h2 className="section">Misadventures In Batch Processing</h2>
            <p className="textBody large padout">
                The first optimization solution I tried was processing a series of frames in parallel. Overall, this did help decrease mean processing time over 
                a collection of images, however it was not a great solution for my specific application. See below.
            </p>
            <div class="wideCenter">
                <YoutubeEmbed embedId="_hhtU2A6xsM" />
            </div>
            <p className="textBody large padout">
                Even though batch processing enabled me to perform detection over more images, it resulted in a much less stable stream.
                Intermittent spikes and dips in performance is much harder to interact with than a consistent (albeit low) overall frame rate.
                It is possible that I just didn't implement it correctly, but I'd much rather blame python for having slow threads. I scrapped
                batch processing for the final version of this project.
            </p>
            <h2 className="section">Model Based Optimization</h2>
            <p className="textBody large padout">
                Another optimization angle I tried was reworking my YOLO model itself. First off, I beefed up my training data to eliminate false negatives
                and converted my training data to grayscale. By reducucing 3 steps of convolution (for 3 color channels) to a single grayscale convolution,
                I managed to eek out a little more processing speed. My grayscale-trained models were a little less consistent in identifying hand gestures,
                however the compromise for speed was worth it.
            </p>

            <p className="textBody large padout">
                To quantify "beefing up", I added about 600 manually annotated images to my dataset. To power through this soul-crushing experience, I ditched
                my crusty matplot annotation tool in favor of <a href="https://app.roboflow.com/">roboflow</a>, which has a much prettier and user friendly 
                web based gui.
            </p>
            <br/>
            <div class="centerContainer">
                <img class='defaultImage' src="/blogAssets/proposalAssets/annotation.png" alt="caveman style data annotation"></img>
            </div>
            <br/>
            <p className="textBody large padout">
                Another model based optimization I tried was reducucing the size of my YOLO model. This strategy required a little bit of tuning
                to balance accuracy and speed. Here is a demo of the first reduced model I trained that could do a bad job very efficiently.
            </p>
            <div class="wideCenter">
                <YoutubeEmbed embedId="LhyCKQKXneU"/>
            </div>
            <br/>
            <br/>

            <h2 className="section">Minority Report It Aint | <span class="italic">Takeaways and Conclusion</span> </h2>
            <p className="textBody large padout">
                At the end of the day, I have built a system that not only allows you to look like a huge dork while using your computer,
                but also has the benefit of not always working. In all seriousness, my main takeaway from this project is that real time
                image processing comes down to managing the difficult balance between accuracy and speed. A model that 
                could find 100% of what it's looking for in 1 frame per second was just as useful as one that could do 1% in 100 frames. 
                Fiddling with model generation felt very much like a zero-sum game. I guess I could always just rewrite everything in C.
            </p>
            <br/>
            <br/>
            <br/>
            <br/>
        </div>
    )
}