Second order approximation of the loss function (Deep learning book, 7.33) Unicorn Meta Zoo...

Could Neutrino technically as side-effect, incentivize centralization of the bitcoin network?

Has a Nobel Peace laureate ever been accused of war crimes?

How to translate "red flag" into Spanish?

Map material from china not allowed to leave the country

Visa-free travel to the US using refugee travel document from Spain?

What *exactly* is electrical current, voltage, and resistance?

Does Feeblemind produce an ongoing magical effect that can be dispelled?

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?

Is this homebrew racial feat, Stonehide, balanced?

Additive group of local rings

How do I check if a string is entirely made of the same substring?

Co-worker works way more than he should

What is "leading note" and what does it mean to "raise a note"?

How to not starve gigantic beasts

How to find the right literary agent in the USA?

What is this word supposed to be?

Why did C use the -> operator instead of reusing the . operator?

"My boss was furious with me and I have been fired" vs. "My boss was furious with me and I was fired"

Is Bran literally the world's memory?

How would I use different systems of magic when they are capable of the same effects?

Is a 5 watt UHF/VHF handheld considered QRP?

Raising a bilingual kid. When should we introduce the majority language?

What is ls Largest Number Formed by only moving two sticks in 508?

Can I criticise the more senior developers around me for not writing clean code?



Second order approximation of the loss function (Deep learning book, 7.33)



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar ManaraLoss and dropout in deep learningderivative of loss functionIncreasing the learning rate on loss function saturationHow exactly to compute Deep Q-Learning Loss Function?Equation 6.3 from “deep learning book”Yolo Loss function explanationLoss convergence in deep learningWhat's the effect of scaling a loss function in deep learning?Yolo v3 loss functionYOLOv3 loss function





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







8












$begingroup$


In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



Quadratic approximation of cost function $j$ is given by:



$$hat{J}(theta)=J(w^*)+frac{1}{2}(w-w^*)^TH(w-w^*)$$



where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
$$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac{1}{2}f''(w)cdotepsilon^2$$










share|cite|improve this question









New contributor




stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$



















    8












    $begingroup$


    In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



    Quadratic approximation of cost function $j$ is given by:



    $$hat{J}(theta)=J(w^*)+frac{1}{2}(w-w^*)^TH(w-w^*)$$



    where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
    $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac{1}{2}f''(w)cdotepsilon^2$$










    share|cite|improve this question









    New contributor




    stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      8












      8








      8


      1



      $begingroup$


      In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



      Quadratic approximation of cost function $j$ is given by:



      $$hat{J}(theta)=J(w^*)+frac{1}{2}(w-w^*)^TH(w-w^*)$$



      where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
      $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac{1}{2}f''(w)cdotepsilon^2$$










      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      In Goodfellow's (2016) book on deep learning, he talked about equivalence of early stopping to L2 regularisation (https://www.deeplearningbook.org/contents/regularization.html page 247).



      Quadratic approximation of cost function $j$ is given by:



      $$hat{J}(theta)=J(w^*)+frac{1}{2}(w-w^*)^TH(w-w^*)$$



      where $H$ is the Hessian matrix (Eq. 7.33). Is this missing the middle term? Taylor expansion should be:
      $$f(w+epsilon)=f(w)+f'(w)cdotepsilon+frac{1}{2}f''(w)cdotepsilon^2$$







      neural-networks deep-learning loss-functions derivative






      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question









      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question








      edited 11 hours ago









      Jan Kukacka

      6,07211640




      6,07211640






      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 12 hours ago









      stevewstevew

      1434




      1434




      New contributor




      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      stevew is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          13












          $begingroup$

          They talk about the weights at optimum:




          We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




          At that point, the first derivative is zero—the middle term is thus left out.






          share|cite|improve this answer









          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "65"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            stevew is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404781%2fsecond-order-approximation-of-the-loss-function-deep-learning-book-7-33%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            13












            $begingroup$

            They talk about the weights at optimum:




            We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




            At that point, the first derivative is zero—the middle term is thus left out.






            share|cite|improve this answer









            $endgroup$


















              13












              $begingroup$

              They talk about the weights at optimum:




              We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




              At that point, the first derivative is zero—the middle term is thus left out.






              share|cite|improve this answer









              $endgroup$
















                13












                13








                13





                $begingroup$

                They talk about the weights at optimum:




                We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




                At that point, the first derivative is zero—the middle term is thus left out.






                share|cite|improve this answer









                $endgroup$



                They talk about the weights at optimum:




                We can model the cost function $J$ with a quadratic approximation in the neighborhood of the empirically optimal value of the weights $w^∗$




                At that point, the first derivative is zero—the middle term is thus left out.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered 12 hours ago









                Jan KukackaJan Kukacka

                6,07211640




                6,07211640






















                    stevew is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    stevew is a new contributor. Be nice, and check out our Code of Conduct.













                    stevew is a new contributor. Be nice, and check out our Code of Conduct.












                    stevew is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404781%2fsecond-order-approximation-of-the-loss-function-deep-learning-book-7-33%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    List of shipwrecks in 1808...

                    Is there a lightweight tool to crop images quickly?Cropping Images using Command Line Tools OnlyHow to crop...

                    Unit packagekit.service is masked Announcing the arrival of Valued Associate #679: Cesar...