import "./blogStyle.css"
import "./blogEntry4.css"
import YoutubeEmbed from "./youtubeEmbed"

export const Entry4 = () => {
    return(
        <div className="post">
            <div className="header">
                <h1>
                    Janky Automated Video Stabilization
                </h1>
                <h3>GWU CSCI 4527</h3>
                <h3>3/28/24 - Owen Wolff</h3>
            </div>

            <div id="intro">

                <h1 className="centerText section">
                    I Have Always Been A Cryptid Enthusiast
                </h1>

                <div className="flexrow vertCenter">
                    <img src="/blogAssets/entry4/hermesNoBg.png" alt="my friend hermes" id="hermes"></img>
                    <div className="flexcol">
                        <p className="textBody">
                            If you didn't know, a cryptid is a creature that exists in rumour and folktale, but hasn't been proven to exist (Loch Ness monster, Bigfoot, Richard Nixon, etc).
                            For this project, I originally wanted to stabilize <a href="https://www.youtube.com/watch?v=41adIfM7CWU" className="hyperlink" target="_blank">that one famous video of bigfoot</a>,
                            however that video proved too blurry and shaky to automate the detection of bigfoot and overlapping points between frames. In lieu of "real" footage
                            of a cryptid, I decided to manufacture my own. Enter my friend, Hermes the turkey (image on left).
                        </p>
                        <br/>
                        <p className="textBody">
                            Hermes, intended to be a doorstop, holds my headphones and glasses for me when I am not using them. For the purposes of this
                            project, he will be learning to fly (thrown at my ceiling).
                        </p>

                    </div>
                </div>

            </div>

            <h2 className="section">Learning To Fly (input)</h2>

            <div className="flexrow">
                <div className="textBody leftText vertCenter">
                    <p>
                        I will be stabilizing the video on the right by interpolating each frame onto the following image. 
                        I set up 4 markers on my wall that will be the anchor points for performing
                        the perspective transformation.
                    </p>
                </div>
                <div id="doubleUp">
                    <img src="/blogAssets/entry4/dotsBG.jpg" alt="wall of my dorm, larger frame than video" className="quarter"></img>
                    <video width="40%" controls>
                        <source src="/blogAssets/entry4/dotsVid.mp4" type="video/mp4" />
                        Your browser does not support the video tag.
                    </video>
                </div>
            </div>

            <h2 className="section">Step 1: Get Them Anchors</h2>
            <div className="flexrow marginalize">
                <img src="/blogAssets/entry4/markers.jpg" alt="mask over each marker" className="smallerVertFrame"></img>
                <p className="textBody vertCenter">
                    The first step to stabilizing my video is to build a way to dynamically get the position of my anchor points
                    for each frame. I experimented with different kinds of anchor points taped to my wall, but settled on the red blobs because
                    of how well they contrasted with the rest of the scene. Taking advantage of this contrast made finding these points real easy. 
                    All I had to do was mask over a simple color filter on the image and return the position of the centroid of each mask. Considering the simplistic
                    approach, I thought it worked out pretty well.
                </p>
            </div>
            <h2 className="section">Step 2: Profit</h2>
            <div className="flexrow marginalize vertCenter">
                <div className="flexcol">
                    <p className="textBody vertCenter">
                        This is actually enough information to perform a basic stabilization. Using the anchor point positions, I calculated the 
                        perspective transformation matrix which I then applied to the frame I wanted to interpolate. With the source frame now in
                        destination (bigger image) coordinates, I turned the transformed frame into a mask, which could be put on top of the destination
                        image. 
                    </p>
                    <p className="textBody vertCenter">
                        By writing the masked destination image to a new file and repeating this process for each frame in the original video, I 
                        get a series of new images that can be compiled together to be a video. In some of the frames Hermes was obscuring the marker
                        blobs which results in bad interpolations, but overall I am pleased with the result.
                    </p>
                </div>
                <YoutubeEmbed embedId="-TTQR8MS1Ew" className="center"/>
            </div>
            <h2 className="section">Step 3: Make it More Difficult</h2>
            <div className="flexrow marginalize vertCenter">
                <img src="/blogAssets/entry4/turkeyMask.jpg" alt="mask over hermes" className="smallerVertFrame"></img>
                <div className="flexcol">
                    <p className="textBody">
                        I wanted to see if I could do a fancier technique where I only interpolate Hermes into the image instead of the entire frame. 
                        In order to do this, I need to build a way to dynamically find Hermes in a given frame. I approached this in a similar manner 
                        to how I did the anchor points. Unfortunately, Hermes' plumage does not contrast as well as my red blobs against the scene.
                    </p>
                    <br/>
                    <p className="textBody">
                        To work around this, I made some assumptions about Hermes' location in the scene. For example, I asserted that he would
                        always be between the anchor points on the x axis and that he would never be above (technically below in image coordinates) 
                        some threshold on the y axis. With those assumptions, I was able to create a general mask over a whole bunch of stuff in the scene
                        and then choose the mask that best fit my assumptions.
                    </p>
                    <br/>
                    <p className="textBody">
                        <div>
                            I toyed with the idea of using YOLO (You Only Look Once) to identify Hermes out of a set of masks instead, however the base
                            YOLO model checkpoint was not very good at identifing Hermes, and I had neither the time nor the will to train my own model.
                            Besides, I think my janky naive approach worked <span className="italic">well enough</span>.
                        </div>
                    </p>
                    <br/>
                    <p className="textBody">
                        SIDENOTE: I really like this image.
                    </p>
                </div>
            </div>
            <h2 className="section">Step 4: Cage The Animal (Using A Bounding Box)</h2>
            <div className="flexrow marginalize vertCenter">
                <div className="flexcol leftText">
                    <p className="textBody ">
                        With Hermes successfully* masked, I can find the bounding box around him. This bouding box is the subsection of the original
                        frame that will be interpolated onto the larger image. One thing I learned from this project is that interpolating a subsection 
                        of an image onto another image (specifically when the subsection does not include the anchor points) requires a lot more math 
                        than simply interpolating the whole frame. 
                    </p>
                    <p className="textBody">
                        Without getting too specific,
                        what I did was calculate the perspective transformation the same way as before, applied the perspective transformation to the bounding box, then copied the
                        points within the transformed bounding box from the source frame onto the background image.
                    </p>
                </div>
                <img src="/blogAssets/entry4/cropped_image.png" alt="hermes, bound" className="smallerVertFrame"></img>
            </div>
            <h2 className="section">Step 5: Profit Again</h2>
            <div className="flexrow marginalize">
                <YoutubeEmbed embedId="6AD_zMePA90" className="center"/>
                <p className="textBody vertCenter">
                    With our interpolation done, we can compile the series of new frames and we get something that looks sorta like a stabilized video.
                </p>
            </div>
            <h2 className="section">Step 6: Brag And Gloat</h2>
            <div className="fullcenter marginalizeBoth">
                <div className="flexcol">
                    <p className="textBody">
                        As you can see, the fancy stabilization didn't work very well. I imagine that my naive approach to identifing Hermes
                        resulted in a lot of the artifacts present in the video. To my credit, when my Hermes detection <span className="italic">did</span> work,
                        the interpolation doesn't look that bad. Going forward, it might be worthwhile to think about more robust approaches to identifying
                        target objects in a scene.
                        This whole project can be found on <a classNme="hyperlink" href="https://github.com/scaboodles/CS4527Proj4.git">GitHub</a>.
                    </p>
                    <p className="textBody">
                        I realize that this project did not center around a dataset of images > 1,000, which is a questionable descision on my part, 
                        since that was one of the only constraints we were given. That said, I really wanted to apply what I learned about geometric
                        computer vision principals, I had a lot of fun working on this project, and I think it turned out kinda cool. For these reasons,
                        I hope you consider giving me some leeway in this regard.
                    </p>
                </div>
            </div>
            <br/>
            <br/>
            <br/>
            <br/>
            <br/>
        </div>
    )
}